Clojure / Incanter Data Transformations Capabilities - clojure

I'm considering Clojure / Incanter as an alternative to R
and just wondering if clojure / incanter have the capabilities to do the following:
Import the result of a SQL statement as a data-set ( I do this in R using dbGetQuery ).
Reshape the data-set - turning rows into columns also known as "pivot" / "unpivot"- I do this in R using the reshape, reshape2 packages ( in the R world it's called melting and casting data ).
Save the reshaped data-set to a SQL table ( I do this in R using dbWriteTable function in RMySQL )

You may be interested in core.matrix - it's a project to bring multi-dimensional array and numerical computation capabilities into Clojure. Still in very active development but already usable.
Features:
A clean, functional API
Proper multi-dimensional arrays
Idiomatic style of working with Clojure data, e.g. the nested vector [[1 2] [3 4]] can be automatically used as a 2x2 matrix.
All of the array reshaping capabilities you might expect.
All of the usual matrix operations (multiplication, scaling, determinants etc.)
Support for multiple back end matrix implementations, e.g. JBLAS for high performance (uses native code)
See some example code here:
;; a matrix can be defined using a nested vector
(def a (matrix [[2 0] [0 2]]))
;; core.matrix.operators overloads operators to work on matrices
(* a a)
;; a wide range of mathematical functions are defined for matrices
(sqrt a)
;; you can get rows and columns of matrices individually
(get-row a 0)
;; Java double arrays can be used as vectors
(* a (double-array [1 2]))
;; you can modify double arrays in place - they are examples of mutable vectors
(let [a (double-array [1 4 9])]
(sqrt! a) ;; "!" signifies an in-place operator
(seq a))
;; you can coerce matrices between different formats
(coerce [] (double-array [1 2 3]))
;; scalars can be used in many places that you can use a matrix
(* [1 2 3] 2)
;; operations on scalars alone behave as you would expect
(* 1 2 3 4 5)
;; you can do various functional programming tricks with matrices too
(emap inc [[1 2] [3 4]])
core.matrix has been approved by Rich Hickey as an official Clojure contrib library, and it is likely that Incanter will switch over to using core.matrix in the future.
SQL table support isn't directly included in core.matrix, but it would only be a one-liner to convert a resultset from clojure.java.jdbc into a core.matrix array. Something like the following should do the trick:
(coerce [] (map vals resultset))
Then you can transform and process it with core.matrix however you like.

Related

Arrays and loops in Clojure

I am trying to learn the programming language Clojure. I have a hard time understanding it though. I would like to try to implement something as simple as this (for example):
int[] array = new int[10];
array[0] =1;
int n = 10;
for(int i = 0; i <= n; i++){
array[i] = 0;
}
return array[n];
Since I am new to Clojure, I don't even understand how to implement this. It would be super helpful if someone could explain or give a similar example of how arrays/for-loops works in Clojure. I've tried to do some research, but as for my understanding Clojure doesn't really have for-loops.
The Clojure way to write the Java for loop you wrote is to do consider why you're looping in the first place. There are many options for porting a Java loop to Clojure. Choosing among them depends on what your goal is.
Ways to Make Ten Zeroes
As Carcigenicate posted, if you need ten zeros in a collection, consider:
(repeat 10 0)
That returns a sequence – a lazy one. Sequences are one of Clojure's central abstractions. If instead the ten zeros need to be accessible by index, put them in a vector with:
(vec (repeat 10 0))
or
(into [] (repeat 10 0))
Or, you could just write the vector literal directly in your code:
[0 0 0 0 0 0 0 0 0 0]
And if you specifically need a Java array for some reason, then you can do that with to-array:
(to-array (repeat 10 0))
But remember the advice from the Clojure reference docs on Java interop:
Clojure supports the creation, reading and modification of Java arrays [but] it is recommended that you limit use of arrays to interop with Java libraries that require them as arguments or use them as return values.
Those docs list some functions for working with Java arrays primarily when they're required for Java interop or "to support mutation or higher performance operations". In nearly all cases Clojurists just use a vector.
Looping in Clojure
What if you're doing something other than producing ten zeros? The Clojure way to "loop" depends on what you need.
You may need recursion, for which Clojure has loop/recur:
(loop [x 10]
(when-not (= x 0)
(recur (- x 2))))
You may need to calculate a value for every value in some collection(s):
(for [x coll])
(calculate-y x))
You might need to iterate over multiple collections, similar to nested loops in Java:
(for [x ['a 'b 'c]
y [1 2 3]]
(foo x y))
If you just need to produce a side effect some number of times, repeatedly is your jam:
(repeatedly 10 some-fn)
If you need to produce a side effect for each value in a collection, try doseq:
(doseq [x coll]
(do-some-side-effect! x))
If you need to produce a side effect for a range of integers, you could use doseq like this:
(doseq [x (range 10)]
(do-something! x))
...but dotimes is like doseq with a built-in range:
(dotimes [x 9]
(do-something! x))
Even more common than those loopy constructs are Clojure functions which produce a value for every element in a collection, such as map and its relatives, or which iterate over collections either for special purposes (like filter and remove) or to create some new value or collection (like reduce).
I think that last point in #Lee's answer should be heavily emphasized.
Unless you absolutely need arrays for performance reasons, you should really be using vectors here. That entire chunk of code suddenly becomes trivial once you start using native Clojure structures:
; Create 10 0s, then put them in a vector
(vec (repeat 10 0))
You can even skip the call to vec if you're ok using a lazy sequence instead of a strict vector.
It should also be noted though that pre-initializing the elements to 0 is unnecessary. Unlike arrays, vectors are variable length; they can be added to and expanded after creation. It's much cleaner to just have an empty vector and add elements to it as needed.
You can create and modify arrays in clojure, your example would look like:
(defn example []
(let [arr (int-array 10)
n 10]
(aset arr 0 1)
(doseq [i (range (inc n))]
(aset arr i 0))
(aget arr n)))
Be aware that this example will throw an exception since array[i] is out of bounds when i = 10.
You will usually use vectors in preference to arrays since they are immutable.
You can get a good start here:
Brave Clojure
The Clojure CheatSheet

Modify vector so it can be invoked with two arguments

I'm playing with a matrix implementation in Clojure which I'm doing for the fun of doing it and learning more about Clojure, rather than because I want to create the bestest fastest most coolest matrix implementation in the world.
One of the primary operations needed in code like this is the ability to return the value at a given row and column in a matrix, which of course I've written as a function
(mat-getrc m 2 3)
says "Give me the value at row 2, column 3 in matrix m". Perfectly good Clojure, but verbose and ugly. I'd rather write
(m 2 3)
but of course A) vectors (in my package matrices are just vectors) only respond to a single argument, and B) vectors don't know how to use the row and column number to figure out where the correct value is stored.
From looking at the docs for IFn (which vectors are supposed to implement) it appears that a two-argument version of invoke exists - but how do I get my "matrix" vectors to implement and respond to it?
Any suggestions and pointing-in-the-right-direction appreciated.
You can't modify how vectors are invoked as that's built into the implementation of vector, but you can define your own type that wraps a vector, acts as a vector, and is invokable however you like with deftype. You would need to extend many of the same interfaces that vectors implement (this is however a large list):
user=> (ancestors clojure.lang.PersistentVector)
#{clojure.lang.IEditableCollection clojure.lang.ILookup
java.util.concurrent.Callable java.lang.Runnable clojure.lang.IMeta
java.lang.Comparable clojure.lang.IReduceInit
clojure.lang.IPersistentCollection clojure.lang.IHashEq java.lang.Iterable
clojure.lang.IReduce java.util.List clojure.lang.AFn clojure.lang.Indexed
clojure.lang.Sequential clojure.lang.IPersistentStack java.io.Serializable
clojure.lang.Reversible clojure.lang.Counted java.util.Collection
java.util.RandomAccess java.lang.Object clojure.lang.Seqable
clojure.lang.Associative clojure.lang.APersistentVector
clojure.lang.IKVReduce clojure.lang.IPersistentVector clojure.lang.IObj
clojure.lang.IFn}
(def matrix [[1 2 3 4][5 6 7 8][9 10 11 12]])
As you say in your question this is possible:
(matrix 2)
But this is not:
(matrix 2 3)
This would be a standard way to get the index of an index:
(get-in matrix [2 3])
You can already nearly get what you want, just with a few more parens:
((matrix 2) 3)
You could define a higher order function:
(defn matrix-hof [matrix]
(fn [x y]
(get-in matrix [x y])))
Then put the function rather than the matrix in function position:
(let [m (matrix-hof matrix)]
(m 2 3))
I don't believe that exactly what you are asking is possible using either a function or a macro.

Clojure : is there a more idiomatic way to work on nested vectors?

I want to cap samples that I generate from Poisson's distributions.
Original data is like
[[2 12] [3 14]] (samples)
Here, [2 12] correspond to samples of distributions [P1 P2], [3 14] as well.
I want to cap P1 and P2 with max values, let's say for instance
[4 12] (max-values)
With these parameters, I want so to output (I want to keep vectors)
[[2 12] [3 12]]
This is pretty easy but I do not know if my way is very idiomatic :
(defn cap-poisson-samples
"Cap poisson samples to meet the expactations
if required"
[data max-values]
(mapv
(fn [x]
(mapv (fn [u v] (if (> u v) v u)) x max-values))
data))
Someone told me that it's better to avoid nested map in the past but I do not know if it's true.
I know prewalk exists but it's not possible to pass two inputs (like my second mapv).
i could also use a for but it's heavier.
Generally speaking, I'm quite lost when I work with two vectors I have to process on same indexes. I searched in clojure;core but I did not find any.
So I generally use for or mapv depending on the fact that the index is important or not.
Thanks

What's the one-level sequence flattening function in Clojure?

What's the one-level sequence flattening function in Clojure? I am using apply concat for now, but I wonder if there is a built-in function for that, either in standard library or clojure-contrib.
My general first choice is apply concat. Also, don't overlook (for [subcoll coll, item subcoll] item) -- depending on the broader context, this may result in clearer code.
There's no standard function. apply concat is a good solution in many cases. Or you can equivalently use mapcat seq.
The problem with apply concat is that it fails when there is anything other than a collection/sequential is at the first level:
(apply concat [1 [2 3] [4 [5]]])
=> IllegalArgumentException Don't know how to create ISeq from: java.lang.Long...
Hence you may want to do something like:
(defn flatten-one-level [coll]
(mapcat #(if (sequential? %) % [%]) coll))
(flatten-one-level [1 [2 3] [4 [5]]])
=> (1 2 3 4 [5])
As a more general point, the lack of a built-in function should not usually stop you from defining your own :-)
i use apply concat too - i don't think there's anything else in the core.
flatten is multiple levels (and is defined via a tree-walk, not in terms of repeated single level expansion)
see also Clojure: Semi-Flattening a nested Sequence which has a flatten-1 from clojure mvc (and which is much more complex than i expected).
update to clarify laziness:
user=> (take 3 (apply concat (for [i (range 1e6)] (do (print i) [i]))))
012345678910111213141516171819202122232425262728293031(0 1 2)
you can see that it evaluates the argument 32 times - this is chunking for efficiency, and is otherwise lazy (it doesn't evaluate the whole list). for a discussion of chunking see comments at end of http://isti.bitbucket.org/2012/04/01/pipes-clojure-choco-1.html

Computer algebra for Clojure

Short version:
I am interested in some Clojure code which will allow me to specify the transformations of x (e.g. permutations, rotations) under which the value of a function f(x) is invariant, so that I can efficiently generate a sequence of x's that satisfy r = f(x). Is there some development in computer algebra for Clojure?
For (a trivial) example
(defn #^{:domain #{3 4 7}
:range #{0,1,2}
:invariance-group :full}
f [x] (- x x))
I could call (preimage f #{0}) and it would efficiently return #{3 4 7}. Naturally, it would also be able to annotate the codomain correctly. Any suggestions?
Longer version:
I have a specific problem that makes me interested in finding out about development of computer algebra for Clojure. Can anyone point me to such a project? My specific problem involves finding all the combinations of words that satisfy F(x) = r, where F is a ranking function and r a positive integer. In my particular case f can be computed as a sum
F(x) = f(x[0]) + f(x[1]) + ... f(x[N-1])
Furthermore I have a set of disjoint sets S = {s_i}, such that f(a)=f(b) for a,b in s, s in S. So a strategy to generate all x such that F(x) = r should rely on this factorization of F and the invariance of f under each s_i. In words, I compute all permutations of sites containing elements of S that sum to r and compose them with all combinations of the elements in each s_i. This is done quite sloppily in the following:
(use 'clojure.contrib.combinatorics)
(use 'clojure.contrib.seq-utils)
(defn expand-counter [c]
(flatten (for [m c] (let [x (m 0) y (m 1)] (repeat y x)))))
(defn partition-by-rank-sum [A N f r]
(let [M (group-by f A)
image-A (set (keys M))
;integer-partition computes restricted integer partitions,
;returning a multiset as key value pairs
rank-partitions (integer-partition r (disj image-A 0))
]
(apply concat (for [part rank-partitions]
(let [k (- N (reduce + (vals part)))
rank-map (if (pos? k) (assoc part 0 k) part)
all-buckets (lex-permutations (expand-counter rank-map))
]
(apply concat (for [bucket all-buckets]
(let [val-bucket (map M bucket)
filled-buckets (apply cartesian-product val-bucket)]
(map vec filled-buckets)))))))))
This gets the job done but misses the underlying picture. For example, if the associative operation were a product instead of a sum I would have to rewrite portions.
The system below does not yet support combinatorics, though it would not be a huge effort to add them, loads of good code already exists, and this could be a good platform to graft it onto, since the basics are pretty sound. I hope a short plug is not inappropriate here, this is the only serious Clojure CAS I know of, but hey, what a system...
=======
It may be of interest to readers of this thread that Gerry Sussman's scmutils system is being ported to Clojure.
This is a very advanced CAS, offering things like automatic differentiation, literal functions, etc, much in the style of Maple.
It is used at MIT for advanced programs on dynamics and differential geometry, and a fair bit of electrical engineering stuff. It is also the system used in Sussman&Wisdom's "sequel" (LOL) to SICP, SICM (Structure and Interpretation of Classical Mechanics).
Although originally a Scheme program, this is not a direct translation, but a ground-up rewrite to take advantage of the best features of Clojure. It's been named sicmutils, both in honour of the original and of the book
This superb effort is the work of Colin Smith and you can find it at https://github.com/littleredcomputer/sicmutils .
I believe that this could form the basis of an amazing Computer Algebra System for Clojure, competitive with anything else available. Although it is quite a huge beast, as you can imagine, and tons of stuff remains to be ported, the basics are pretty much there, the system will differentiate, and handle literals and literal functions pretty well. It is a work in progress. The system also uses the "generic" approach advocated by Sussman, whereby operations can be applied to functions, creating a great abstraction that simplifies notation no end.
Here's a taster:
> (def unity (+ (square sin) (square cos)))
> (unity 2.0) ==> 1.0
> (unity 'x) ==> 1 ;; yes we can deal with symbols
> (def zero (D unity)) ;; Let's differentiate
> (zero 2.0) ==> 0
SicmUtils introduces two new vector types “up” and “down” (called “structures”), they work pretty much as you would expect vectors to, but have some special mathematical (covariant, contravariant) properties, and also some programming properties, in that they are executable!
> (def fnvec (up sin cos tan)) => fnvec
> (fnvec 1) ==> (up 0.8414709848078965 0.5403023058681398 1.5574077246549023)
> ;; differentiated
> ((D fnvec) 1) ==> (up 0.5403023058681398 -0.8414709848078965 3.425518820814759)
> ;; derivative with symbolic argument
> ((D fnvec) 'θ) ==> (up (cos θ) (* -1 (sin θ)) (/ 1 (expt (cos θ) 2)))
Partial differentiation is fully supported
> (defn ff [x y] (* (expt x 3)(expt y 5)))
> ((D ff) 'x 'y) ==> (down (* 3 (expt x 2) (expt y 5)) (* 5 (expt x 3) (expt y 4)))
> ;; i.e. vector of results wrt to both variables
The system also supports TeX output, polynomial factorization, and a host of other goodies. Lots of stuff, however, that could be easily implemented has not been done purely out of lack of human resources. Graphic output and a "notepad/worksheet" interface (using Clojure's Gorilla) are also being worked on.
I hope this has gone some way towards whetting your appetite enough to visit the site and give it a whirl. You don't even need Clojure, you could run it off the provided jar file.
There's Clojuratica, an interface between Clojure and Mathematica:
http://clojuratica.weebly.com/
See also this mailing list post by Clojuratica's author.
While not a CAS, Incanter also has several very nice features and might be a good reference/foundation to build your own ideas on.
Regarding "For example, if the associative operation were a product instead of a sum I would have to rewrite portions.": if you structure your code accordingly, couldn't you accomplish this by using higher-order functions and passing in the associative operation? Think map-reduce.
I am unaware of any computer algebra systems written in Clojure. However, for my rather simple mathematical needs I have found it often useful to use Maxima, which is written in lisp. It is possible to interact with Maxima using s-expressions or a higher-level representations, which can be really convenient. Maxima also has some rudimentary combinatorics functions which may be what you are looking for.
If you are hellbent on using Clojure, in the short term perhaps throwing your data back and forth between Maxima and Clojure would help you achieve your goals.
In the longer term, I would be interested in seeing what you come up with!