I have found that relational databases are a very good fit for Clojure as the set functions (project/join/union etc) map very nicely to a database schema making Clojure almost a perfect fit for using with databases.
I was wondering how Clojure fits in with graph databases like Neo4j however?
Neo4J has clojure'ey bindings here and here and here
you can get the leiningen and maven config for each of these from clojars
allegrograph is another similar graph data store that is widely supported in clojure. so there is some evidence that the answer may be yes!
graph stores lend them selves well to immutable trees which may be an even better fit to Clojure than sets but this is all fairly subjective. The most objective answer I can give is to point to existing use of graph-stores/triple-stores.
Mark Watson's book (free pdf version: http://www.markwatson.com/opencontent/book_java.pdf), a lesser known Clojure book he self-published last year, covers some useful graph technology, mainly allegrograph.
I myself don't have much experience with graph db libraries, but the above-cited book mentions that neo4j is optimized to traverse graphs, whereas allegrograph is optimized for subgraph matching. So the choice will likely depend on your specific application.
If you go with allegograph, the author of that book waives the AGPL license on his wrappers for production use if you buy copies of his book, and of course can be used under the conditions of the license freely https://github.com/mark-watson/java_practical_semantic_web
The clojure-neo4j wrapper library exists, though it's unclear if it would be code-rotted or ready for use given the last commit date https://github.com/JulianMorrison/neo4j-clojure. The most recently updated fork, by mattrepl, however was not that long ago: https://github.com/mattrepl/clojure-neo4j.git
Related
Is there an idiomatic way to do reactive data synchronization between browser and server with Clojure and Clojurescript? What are the pros and cons of one technique vs another?
Having used Meteor.js in the past this sort of reactive database sync is highly preferable to manually writing routes and polling for updates. A pub/sub system lets web developers write less boilerplate code to move data around. Clojure seems like it would be a natural fit for such a technique. I have been unable to determine if this is a solved problem in the clj/cljs ecosystem.
The short answer is "no".
I've used Meteor.js in production, and also looked for the same when getting into CLJ(S). The closest I'm aware of are:
Datsync
Uses Datomic and Datoms as the sync protocol
Seems to be alpha
Fulcro
Has help for optimistic updates, but not direct data sync like Meteor has with Minimongo, closer to Meteor methods (ie a client and server implementation of the same call)
Production-ready, widely used
Why hasn't a Meteor-killer hasn't been written in CLJ(S)?
It's hard to do (at all, performantly)
The company behind Meteor raised tens of millions of dollars over the last decade to work in this space and still didn't resolve all the scalability or semantic rough edges in the data sync design
The datastores Clojure people tend to use (Datomic or relational) likely make data sync even harder, and certainly quite different (it wouldn't be a port from Meteor's implementation).
Clojure developers tend to prefer assembling a system around the needs of a particular problem/solution/requirement rather than assembling a solution around a particular feature/framework (eg Mongo documents that sync full-stack).
People are still thinking about this though, both inside the Clojure community and out. Check out:
Baqend (and their blog comparing with Meteor.js, Firebase, RechinkDB)
"The Web After Tomorrow" (by the developer of DataScript)
Reactive Datalog
The only meaningful way to compare is to consider them both in your context. I remember giving up Meteor's magic felt significant at the time but haven't found myself wanting to go back. I see it as trading magic for simplicity and flexibility.
In my Clojure project, I am using Clojure Spec but If I need to use some lib like compojure-api then I need to use Schema.
What is the advantage one over the others?
Why would I consider one over the others?
Which one is good for compile type checked?
These are three merely different approaches to give the developer some type safety. All the three offer their own DSL to describe the schema/type of data but they are very different in philosophy. They are all actively maintained and have a nice community.
This is an opinionated overview based of my experiences.
core typed
core typed tries to extend the clojure language with additional macros to annotate functions and vars with static type information. It then uses static type analysis to ensure that the code matches the type info (that is it produces and consumes data of the right types).
Some advantages:
Static typing in general is a very strong tool. If you are familiar with staticly typed programming languages you will appreciate this very much.
Many bugs can be found during compilation time. No more NullPointerExceptions!
Some drawbacks:
Changing something in type or code may require extra work to propagate the changes through all parts of your code. And sometimes it is just too complicated to write type info or correct programs.
Static code checking will slow down your compile times and may slow down your development workflow.
Schema
In Schema you also write type annotations but type checking happens runtime. It encourages you to construct schema declarations dynamically and lets you specify where you want to check for schema and where you do not want its funcionality.
Some advantages:
Very friendly DSL to describe data schema.
Various tools. For example: Data generation for generative testing, tools to explain why a schema does not match.
Some disadvantages:
Only checks for schema where and when you tell it to do so.
External library, not supported by the core team.
Spec
Spec is the latest player with a philosophy borrowed from Racket lang. It is (going to be) part of the Clojure core library from the Clojure version 1.9.
The basic idea is to have entity types specified by the (namespaced) keys in a map object. Spec declarations are stored in the application's registry bound to namespaced keywords. Spec is very strong in sequence validation.
Some advantages:
Part of the Clojure core, not an external library. It is now used for parsing macro arguments and also for documentation purposes.
The community is very excited about it resulting in interesting ideas such as using spec in genetic programming and generative testing.
Some disadvantages:
Will be available in Clojure 1.9 which is not yet a released stable version. It is still a new technology not widely used.
Spec do not look like the data they are describing.
Personally, core.typed feels intimidating and core.spec feels immature so I use schema in production. My advice is the following:
If you need static type checking then core.typed is the way to go.
If you want to do parsing then core.spec is a nice choice.
If you want simple type descriptions then schema will be a good fit.
I am about to start learning functional programming and Clojure appeals to me the most, I love its community, syntax and concept of immutable data structures. I am also interested in bio inspired ML for rich data Numenta.
However, my huge concern is that Spark does not support it as yet, and Spark rocks!!!
There is a Flambo Flambo Clojure ,but is it a satisfactory solution?
My course and job is in Scala. Should I defeat and enter Scala world or do you think that I should focus solely on Clojure?
Being the author or Sparkling (thanks to Josh Rosen for pointing that out), I can tell you that we use it at our company for ETL processing.
Here's what's good:
it provides a Clojuristic way of interacting with Spark, and as you can see in the presentation "Big Data with style - the Clojure/Spark way"
it's optimized for performance
it's used in production and there are others also using it
Here's what's missing:
There's currently no support for Spark Streaming, Spark SQL, Spark Dataframes or Spark ML. That might come in the future, I'm happy to accept pull requests, but it's currently not main focus (at the time of writing, April 2015).
I hope this helps you make up your mind on going with Clojure or starting to learn Scala.
It's hard to say that something like Spark doesn't support Clojure. It would make more sense to ask if there are good libraries to use that project that are easy to use from Clojure. From googling around Flambo looks like a viable option and at the various clojure conferences I hear incidental talk of using Spark in several contexts.
I would say that there is fairly low technical risk in using spark from Clojure so you are free to make this choice based on the other constraints of your working environment and pojects. Being particularly biased toward Clojure I whole heartedly recommend at least trying it and see what parts of the language and ecosystem work well for you.
I'm looking for a good reference on
large scale data mining with Clojure
I know of many good clojure programming books (Programming Clojure, Joy of Clojure, ...), and many good data mining text books (mining of massive data sets, managing gigabytes, ...). However I'm not aware of any reference that specifically addresses
large scale data mining with Clojure
The "with clojure" part is rather important to me for the following reasons:
* most theoretical analysis uses big-Oh running time, which ignores constants
* constants matter, if it ends up being a matter of 1 second vs 1 hour (for things that need to be real time)
* or 1 hour vs 1 week (for batch jobs)
In particular, I think there's a lot of interplay between the JVM, Clojure Data Structures, whether data is stored in memory or lazily read from disk -- that can have the "same" algorithm have drastically different running times by "slightly" different implementations.
Thus, my question (all of the above was to avoid being closed by "Check Google"):
what is a good resource on massive data mining with Clojure?
Thanks!
I don't think anyone's yet written a good comprehensive reference. But there is certainly lots of work going on in this space (my own company included!)
Some interesting links to follow up:
Storm - distributed realtime computation using Clojure. Could be used for large scale data mining.
http://www.infoq.com/presentations/Why-Prismatic-Goes-Faster-With-Clojure - interesting video regarding Clojure performance and optimisation for machine learning applications
Incanter - probably the leading Clojure library for statistics and data visualisation
Weka - very comprehensive data mining / machine learning library for Java (and hence very easy to use directly from Clojure)
There is a wonderful book that is coming out in May 2013: Clojure Data Analysis Cookbook. I will probably buy it.
http://www.amazon.co.uk/Clojure-Data-Analysis-Cookbook-ebook/dp/B00BECVV9C/ref=sr_1_1?s=books&ie=UTF8&qid=1360697819&sr=1-1
In Detail
Data is everywhere and it's increasingly important to be able to gain
insights that we can act on. Using Clojure for data analysis and
collection, this book will show you how to gain fresh insights and
perspectives from your data with an essential collection of practical,
structured recipes.
"The Clojure Data Analysis Cookbook" presents recipes for every stage
of the data analysis process. Whether scraping data off a web page,
performing data mining, or creating graphs for the web, this book has
something for the task at hand.
You'll learn how to acquire data, clean it up, and transform it into
useful graphs which can then be analyzed and published to the
Internet. Coverage includes advanced topics like processing data
concurrently, applying powerful statistical techniques like Bayesian
modelling, and even data mining algorithms such as K-means clustering,
neural networks, and association rules.
Approach
Full of practical tips, the "Clojure Data Analysis Cookbook" will help
you fully utilize your data through a series of step-by-step, real
world recipes covering every aspect of data analysis.
Who this book is for
Prior experience with Clojure and data analysis techniques and
workflows will be beneficial, but not essential.
I really like these tools when it comes to the concurrency level it can handle.
Erlang/OTP looks like much more stable solution but requires much more learning and a lot of diving into functional language paradigm. And it looks like Erlang/OTP makes it much better when it comes to multi-core CPUs (correct me if I am wrong).
But which should I choose? Which one is better in the short and long term perspective?
My goal is to learn a tool which makes scaling my Web projects under high load easier than traditional languages.
I would give Erlang a try. Even though it will be a steeper learning curve, you will get more out of it since you will be learning a functional programming language. Also, since Erlang is specifically designed to create reliable, highly concurrent systems, you will learn plenty about creating highly scalable services at the same time.
I can't speak for Erlang, but a few things that haven't been mentioned about node:
Node uses Google's V8 engine to actually compile javascript into machine code. So node is actually pretty fast. So that's on top of the speed benefits offered by event-driven programming and non-blocking io.
Node has a pretty active community. Hop onto their IRC group on freenode and you'll see what I mean
I've noticed the above comments push Erlang on the basis that it will be useful to learn a functional programming language. While I agree it's important to expand your skillset and get one of those under your belt, you shouldn't base a project on the fact that you want to learn a new programming style
On the other hand, Javascript is already in a paradigm you feel comfortable writing in! Plus it's javascript, so when you write client side code it will look and feel consistent.
node's community has already pumped out tons of modules! There are modules for redis, mongodb, couch, and what have you. Another good module to look into is Express (think Sinatra for node)
Check out the video on yahoo's blog by Ryan Dahl, the guy who actually wrote node. I think that will help give you a better idea where node is at, and where it's going.
Keep in mind that node still is in late development stages, and so has been undergoing quite a few changes—changes that have broke earlier code. However, supposedly it's at a point where you can expect the API not to change too much more. So if you're looking for something fun, I'd say node is a great choice.
I'm a long-time Erlang programmer, and this question prompted me to take a look at node.js. It looks pretty damn good.
It does appear that you need to spawn multiple processes to take advantage of multiple cores. I can't see anything about setting processor affinity though. You could use taskset on linux, but it probably should be parametrized and set in the program.
I also noticed that the platform support might be a little weaker. Specifically, it looks like you would need to run under Cygwin for Windows support.
Looks good though.
Edit
Node.js now has native support for Windows.
I'm looking at the same two alternatives you are, gotts, for multiple projects.
So far, the best razor I've come up with to decide between them for a given project is whether I need to use Javascript. One existing system I'm looking to migrate is already written in Javascript, so its next version is likely to be done in node.js. Other projects will be done in some Erlang web framework because there is no existing code base to migrate.
Another consideration is that Erlang scales well beyond just multiple cores, it can scale to a whole datacenter. I don't see a built-in mechanism in node.js that lets me send another JS process a message without caring which machine it is on, but that's built right into Erlang at the lowest levels. If your problem isn't big enough to need multiple machines or if it doesn't require multiple cooperating processes, this advantage isn't likely to matter, so you should ignore it.
Erlang is indeed a deep pool to dive into. I would suggest writing a standalone functional program first before you start building web apps. An even easier first step, since you seem comfortable with Javascript, is to try programming JS in a more functional style. If you use jQuery or Prototype, you've already started down this path. Try bouncing between pure functional programming in Erlang or one of its kin (Haskell, F#, Scala...) and functional JS.
Once you're comfortable with functional programming, seek out one of the many Erlang web frameworks; you probably shouldn't be writing your app directly to something low-level like inets at this late stage. Look at something like Nitrogen, for instance.
While I'd personally go for Erlang, I'll admit that I'm a little biased against JavaScript. My advice is that you evaluate few points:
Are you reusing existing code in either of those languages (both in terms of source code, and programmer experience!)
Do you need/want on-the-fly updates without stopping the application (This is where Erlang wins by default - its runtime was designed for that case, and OTP contains all the tools necessary)
How big is the expected traffic, in terms of separate, concurrent operations, not bandwidth?
How "parallel" are the operations you do for each request?
Erlang has really fine-tuned concurrency & network-transparent parallel distributed system. Depending on what exactly is the project, the availability of a mature implementation of such system might outweigh any issues regarding learning a new language. There are also two other languages that work on Erlang VM which you can use, the Ruby/Python-like Reia and Lisp-Flavored Erlang.
Yet another option is to use both, especially with Erlang being used as kind of "hub". I'm unsure if Node.js has Foreign Function Interface system, but if it has, Erlang has C library for external processes to interface with the system just like any other Erlang process.
It looks like Erlang performs better for deployment in a relatively low-end server (512MB 4-core 2.4GHz AMD VM). This is from SyncPad's experience of comparing Erlang vs Node.js implementations of their virtual whiteboard server application.
There is one more language on the same VM that erlang is -> Elixir
It's a very interesting alternative to Erlang, check this one out.
Also it has a fast-growing web framework based on it-> Phoenix Framework
whatsapp could never achieve the level of scalability and reliability without erlang https://www.youtube.com/watch?v=c12cYAUTXXs
I will Prefer Erlang over Node.
If you want concurrency, Node can be substituted by Erlang or Golang because of their light weight processes.
Erlang is not easy to learn so requires a lot of effort but its community is active so can get help from that, this is only the reason why people prefer Node .