will Spark support Clojure? - clojure

I am about to start learning functional programming and Clojure appeals to me the most, I love its community, syntax and concept of immutable data structures. I am also interested in bio inspired ML for rich data Numenta.
However, my huge concern is that Spark does not support it as yet, and Spark rocks!!!
There is a Flambo Flambo Clojure ,but is it a satisfactory solution?
My course and job is in Scala. Should I defeat and enter Scala world or do you think that I should focus solely on Clojure?

Being the author or Sparkling (thanks to Josh Rosen for pointing that out), I can tell you that we use it at our company for ETL processing.
Here's what's good:
it provides a Clojuristic way of interacting with Spark, and as you can see in the presentation "Big Data with style - the Clojure/Spark way"
it's optimized for performance
it's used in production and there are others also using it
Here's what's missing:
There's currently no support for Spark Streaming, Spark SQL, Spark Dataframes or Spark ML. That might come in the future, I'm happy to accept pull requests, but it's currently not main focus (at the time of writing, April 2015).
I hope this helps you make up your mind on going with Clojure or starting to learn Scala.

It's hard to say that something like Spark doesn't support Clojure. It would make more sense to ask if there are good libraries to use that project that are easy to use from Clojure. From googling around Flambo looks like a viable option and at the various clojure conferences I hear incidental talk of using Spark in several contexts.
I would say that there is fairly low technical risk in using spark from Clojure so you are free to make this choice based on the other constraints of your working environment and pojects. Being particularly biased toward Clojure I whole heartedly recommend at least trying it and see what parts of the language and ecosystem work well for you.

Related

Is Neo4J a good fit for clojure?

I have found that relational databases are a very good fit for Clojure as the set functions (project/join/union etc) map very nicely to a database schema making Clojure almost a perfect fit for using with databases.
I was wondering how Clojure fits in with graph databases like Neo4j however?
Neo4J has clojure'ey bindings here and here and here
you can get the leiningen and maven config for each of these from clojars
allegrograph is another similar graph data store that is widely supported in clojure. so there is some evidence that the answer may be yes!
graph stores lend them selves well to immutable trees which may be an even better fit to Clojure than sets but this is all fairly subjective. The most objective answer I can give is to point to existing use of graph-stores/triple-stores.
Mark Watson's book (free pdf version: http://www.markwatson.com/opencontent/book_java.pdf), a lesser known Clojure book he self-published last year, covers some useful graph technology, mainly allegrograph.
I myself don't have much experience with graph db libraries, but the above-cited book mentions that neo4j is optimized to traverse graphs, whereas allegrograph is optimized for subgraph matching. So the choice will likely depend on your specific application.
If you go with allegograph, the author of that book waives the AGPL license on his wrappers for production use if you buy copies of his book, and of course can be used under the conditions of the license freely https://github.com/mark-watson/java_practical_semantic_web
The clojure-neo4j wrapper library exists, though it's unclear if it would be code-rotted or ready for use given the last commit date https://github.com/JulianMorrison/neo4j-clojure. The most recently updated fork, by mattrepl, however was not that long ago: https://github.com/mattrepl/clojure-neo4j.git

Clojure for trading strategies

Is anybody using Clojure for developing automated trading strategies? What is your experience? I am anticipating learning Clojure and wanted to know whether or not I could use it in this context. If there are any resources for using it in this context, please provide a link. I am currently only using Ruby and JavaScript for web development.
I haven't seen any clojure-specific work in this area, though that probably has more to do with clojure being a brand new language than anything else. Certainly, clojure's quantitative capabilities are growing -- have a look at incanter if you haven't already.
If it's low-latency/high-speed trading, then I have my doubts as to whether clojure would be appropriate (despite clojure being my favourite language). C++ really seems to be the best option in this regard.
Clojure can be useful due to its:
Functional programming capabilities
Fault tolerance

Node.js or Erlang

I really like these tools when it comes to the concurrency level it can handle.
Erlang/OTP looks like much more stable solution but requires much more learning and a lot of diving into functional language paradigm. And it looks like Erlang/OTP makes it much better when it comes to multi-core CPUs (correct me if I am wrong).
But which should I choose? Which one is better in the short and long term perspective?
My goal is to learn a tool which makes scaling my Web projects under high load easier than traditional languages.
I would give Erlang a try. Even though it will be a steeper learning curve, you will get more out of it since you will be learning a functional programming language. Also, since Erlang is specifically designed to create reliable, highly concurrent systems, you will learn plenty about creating highly scalable services at the same time.
I can't speak for Erlang, but a few things that haven't been mentioned about node:
Node uses Google's V8 engine to actually compile javascript into machine code. So node is actually pretty fast. So that's on top of the speed benefits offered by event-driven programming and non-blocking io.
Node has a pretty active community. Hop onto their IRC group on freenode and you'll see what I mean
I've noticed the above comments push Erlang on the basis that it will be useful to learn a functional programming language. While I agree it's important to expand your skillset and get one of those under your belt, you shouldn't base a project on the fact that you want to learn a new programming style
On the other hand, Javascript is already in a paradigm you feel comfortable writing in! Plus it's javascript, so when you write client side code it will look and feel consistent.
node's community has already pumped out tons of modules! There are modules for redis, mongodb, couch, and what have you. Another good module to look into is Express (think Sinatra for node)
Check out the video on yahoo's blog by Ryan Dahl, the guy who actually wrote node. I think that will help give you a better idea where node is at, and where it's going.
Keep in mind that node still is in late development stages, and so has been undergoing quite a few changes—changes that have broke earlier code. However, supposedly it's at a point where you can expect the API not to change too much more. So if you're looking for something fun, I'd say node is a great choice.
I'm a long-time Erlang programmer, and this question prompted me to take a look at node.js. It looks pretty damn good.
It does appear that you need to spawn multiple processes to take advantage of multiple cores. I can't see anything about setting processor affinity though. You could use taskset on linux, but it probably should be parametrized and set in the program.
I also noticed that the platform support might be a little weaker. Specifically, it looks like you would need to run under Cygwin for Windows support.
Looks good though.
Edit
Node.js now has native support for Windows.
I'm looking at the same two alternatives you are, gotts, for multiple projects.
So far, the best razor I've come up with to decide between them for a given project is whether I need to use Javascript. One existing system I'm looking to migrate is already written in Javascript, so its next version is likely to be done in node.js. Other projects will be done in some Erlang web framework because there is no existing code base to migrate.
Another consideration is that Erlang scales well beyond just multiple cores, it can scale to a whole datacenter. I don't see a built-in mechanism in node.js that lets me send another JS process a message without caring which machine it is on, but that's built right into Erlang at the lowest levels. If your problem isn't big enough to need multiple machines or if it doesn't require multiple cooperating processes, this advantage isn't likely to matter, so you should ignore it.
Erlang is indeed a deep pool to dive into. I would suggest writing a standalone functional program first before you start building web apps. An even easier first step, since you seem comfortable with Javascript, is to try programming JS in a more functional style. If you use jQuery or Prototype, you've already started down this path. Try bouncing between pure functional programming in Erlang or one of its kin (Haskell, F#, Scala...) and functional JS.
Once you're comfortable with functional programming, seek out one of the many Erlang web frameworks; you probably shouldn't be writing your app directly to something low-level like inets at this late stage. Look at something like Nitrogen, for instance.
While I'd personally go for Erlang, I'll admit that I'm a little biased against JavaScript. My advice is that you evaluate few points:
Are you reusing existing code in either of those languages (both in terms of source code, and programmer experience!)
Do you need/want on-the-fly updates without stopping the application (This is where Erlang wins by default - its runtime was designed for that case, and OTP contains all the tools necessary)
How big is the expected traffic, in terms of separate, concurrent operations, not bandwidth?
How "parallel" are the operations you do for each request?
Erlang has really fine-tuned concurrency & network-transparent parallel distributed system. Depending on what exactly is the project, the availability of a mature implementation of such system might outweigh any issues regarding learning a new language. There are also two other languages that work on Erlang VM which you can use, the Ruby/Python-like Reia and Lisp-Flavored Erlang.
Yet another option is to use both, especially with Erlang being used as kind of "hub". I'm unsure if Node.js has Foreign Function Interface system, but if it has, Erlang has C library for external processes to interface with the system just like any other Erlang process.
It looks like Erlang performs better for deployment in a relatively low-end server (512MB 4-core 2.4GHz AMD VM). This is from SyncPad's experience of comparing Erlang vs Node.js implementations of their virtual whiteboard server application.
There is one more language on the same VM that erlang is -> Elixir
It's a very interesting alternative to Erlang, check this one out.
Also it has a fast-growing web framework based on it-> Phoenix Framework
whatsapp could never achieve the level of scalability and reliability without erlang https://www.youtube.com/watch?v=c12cYAUTXXs
I will Prefer Erlang over Node.
If you want concurrency, Node can be substituted by Erlang or Golang because of their light weight processes.
Erlang is not easy to learn so requires a lot of effort but its community is active so can get help from that, this is only the reason why people prefer Node .

Which could become a strong alternative JVM language: Scala, Clojure, Fan, JavaFX Script, or other?

I am currently deciding on an alternative JVM language to port an existing Swing desktop application written in Java 6. Given that JavaFX specifically targets this kind of application, it would seem that my best option is JavaFX Script.
However, what about other kinds of applications and libraries? Would JavaFX Script be the best choice in general for a second JVM language?
Currently, it seems that Scala is the most talked about alternative to the Java language. This month (October 2009), it is at position 34 in the TIOBE index, while JavaFX Script is at position 44, and Clojure, Fan, and Groovy are at positions below 50.
So, what are your impressions? Which language would you invest your time in learning and using (and why), assuming you can freely choose the language for a given project to run in the JVM?
My main question would be: why are you porting an existing application? The answer to this question may give you some idea of where you want to go.
Some quick perspectives on the main choices:
Scala is in my view, a better Java than Java. If you want a language that takes the best bits of Java buts adds a lot of new innovations and features, then it may well be for you.
Clojure is an amazingly well designed language, particularly if you believe in a future of highly complex, concurrent applications. It's also extremely productive - I can probably create more value/hour in Clojure than any other language. However, unless you already know Lisp it will seem very unfamiliar at first. If you are willing to live on the cutting edge to get these benefits, Clojure may well be for you.
JavaFX script - has some very nice features for GUI design, and clearly has support of Sun/Oracle. On the other hand, I don't see it having massive traction outside this domain. I'd suggest giving it a trail run to see if it meets you needs.
Java - should still be on your list! If the reason you are porting is because the code has become difficult to maintain, then maybe a focused phase of re-factoring while staying on Java can get you the benefits you want. It's possible to write perfectly good GUI applications in Java.
Groovy - really nice scripting language on the JVM. Particularly good if you want to embed scripting features within an existing Java/JVM application. Not sure I'd choose it for (re)writing a complete application however.
JRuby / Jython - haven't seen these much myself but heard good things. Probably most suitable if you have Ruby / Python skills in the team but also want the benefits of the JVM platform.
The best alternate language, and the best language overall, IMO, is that which best allows you to write the program in the best model for you.
So, if you are writing a GUI app, then Scala may be the incorrect choice, as you wouldn't be moving away from Swing.
If JavaFX best meets your needs, then use that language.
If you know LISP then Clojure would be a good choice, but, like Scala, not for this problem, it sounds like.
If you don't know lisp and you want/need a functional programming language, then Scala would be the best choice.
Basically, there is no one language that is best in all situations, it helps to know what you want to do, and the strengths/weaknesses of the various options.
Those all sound like good choices. You could add JRuby to the list...

Dataflow Programming - Patterns and Frameworks

I just came across the proposed Boost::Dataflow library.
It seems like an interesting approach and I was wondering if there are other such alternative frameworks for C++, and if there are any related design patterns.
I have not ruled out Boost::Dataflow, I am just looking into any available alternatives so I can understand the domain and my options better (or roll my own if necessary).
Wikipedia
There are a couple of good articles in the Wikipedia about the theory of the dataflow programming:
Dataflow
Dataflow programming
Flow based programming
Actor model
Visual programming
These articles are written by various authors, so there are some overlaps, and some important stuff are missing, but it is a very good start point.
TinyOS
This is an open source operating system based on the dataflow principle. I have bad feelings about that: they don't even mention the term "dataflow". Altough, it is that, and maybe it's worth studying it.
Look at Intel Threading Building Blocks, particullary its tbb::flow namespace.
You can also look at the two main open source robotics frameworks, ROS and Orocos. There is also Rock, but it is based on Orocos, so it is equivalent if you're just looking for a C++ component framework.
There are some dataflow C++ libraries I have found:
cellspp - allows to use spreadsheet evaluation model.
DSPatch and Route11 - C++ dataflow frameworks. Allows to write programs in dataflow manner. Looks interesting.
if you want this design for image processing or visulization, you can find a good ressource in itk. And if you want a gui for this (data/work)flow you can use devide.
My 2cents,
Johan
Just for the records, you can also consider gstreamermm, which is a C++ wrapper around gstreamer.
Dataflow programming is one of those things that's been lurking around for decades and never quite taken off... for software anyway; in the VHDL/Verilog world you find yourself naturally adopting the dataflow mindset much more readily. But in the software world... somehow it just never seems to scale beyond toy systems, perhaps because people insist on tying it together with visual programming (and I see boost dataflow also treads this path). Some people look to dataflow programming to solve the software crisis by making it more like HW design with pluggable components with interconnectable pins... but hang on, HW design is really hard too! (Interestingly, while in the HW world visual programming systems do exist, noone actually uses them to build anything big).
The most interesting, active modern example I'm aware of using dataflow principles is the PureData audio-visual programming environment.
Visual Studio Concurrency Runtime contains an asynchronous dataflow framework in C++.
An example of image processing dataflow: http://msdn.microsoft.com/en-us/library/ff398050.aspx
You might check my implementation of dataflow here: http://ambient.comp-phys.org
It supports MPI and threading and is based upon custom dataflow types (i.e. ambient::vector) that work through run-time object versioning system.
If your area is sound generation/processing, use http://www.synthedit.com/
It looks promising, I've found a good answers for a deep problem in the SDK docs (polyphony). Funny, but they don't mention the word dataflow.
Maybe Pure Data (pd) has a C++ API...
http://en.wikipedia.org/wiki/Pure_Data