I am using following simple code to solve n-queens problem:
#lang racket
; following returns true if queens are on diagonals:
(define (check-diagonals bd)
(for/or ((r1 (length bd)))
(for/or ((r2 (range (add1 r1) (length bd))))
(= (abs (- r1 r2))
(abs(- (list-ref bd r1)
(list-ref bd r2)))))))
; set board size:
(define N 8)
; start searching:
(for ((brd (in-permutations (range N))))
(when (not (check-diagonals brd))
(displayln brd)))
It is working fine but takes a long time for larger values of N. It uses in-permutations function to get a stream of permutations. I also see that it uses only 25% of cpu power (1 of 4 cores is being used). How can I modify this code so that it uses parallel testing of permutations from in-permutations stream and uses all 4 cpu cores? Thanks for your help.
Edit: Modified check-diagonals function as suggested in the comments. Older code was:
(define (check-diagonals bd)
(define res #f)
(for ((r1 (length bd))
#:break res)
(for ((r2 (range (add1 r1) (length bd)))
#:break res)
(when (= (abs (- r1 r2))
(abs(- (list-ref bd r1)
(list-ref bd r2))))
(set! res #t))))
res)
To start, before even parallelizing anything, you can improve your original program quite a bit. Here are some changes you can make:
Use for/or instead of mutating a binding and using #:break.
Use for*/or instead of nesting for/or loops inside of each other.
Use in-range where possible to ensure for loops specialize to simple loops upon expansion.
Use square brackets instead of parentheses in the relevant places to make your code more readable to Racketeers (especially since this code isn’t remotely portable Scheme, anyway).
With those changes, the code looks like this:
#lang racket
; following returns true if queens are on diagonals:
(define (check-diagonals bd)
(for*/or ([r1 (in-range (length bd))]
[r2 (in-range (add1 r1) (length bd))])
(= (abs (- r1 r2))
(abs (- (list-ref bd r1)
(list-ref bd r2))))))
; set board size:
(define N 8)
; start searching:
(for ([brd (in-permutations (range N))])
(unless (check-diagonals brd)
(displayln brd)))
Now we can turn to parallelizing. Here’s the thing: parallelizing things tends to be tricky. Scheduling work in parallel tends to have overhead, and that overhead can outweigh the benefits more often than you might think. I spent a long time trying to parallelize this code, and ultimately, I was not able to produce a program that ran faster than the original, sequential program.
Still, you are likely interested what I did. Maybe you (or someone else) will be able to come up with something better than I can. The relevant tool for the job here is Racket’s futures, designed for fine-grained parallelism. Futures are fairly restrictive due to the way Racket’s runtime is designed (which is essentially the way it is for historical reasons), so not just anything can be parallelized, and a fairly large number of operations can cause futures to block. Fortunately, Racket also ships with the Future Visualizer, a graphical tool for understanding the runtime behavior of futures.
Before we can begin, I ran the sequential version of the program with N=11 and recorded the results.
$ time racket n-queens.rkt
[program output elided]
14.44 real 13.92 user 0.32 sys
I will use these numbers as a point of comparison for the remainder of this answer.
To start, we can try replacing the primary for loop with for/asnyc, which runs all its bodies in parallel. This is an extremely simple transformation, and it leaves us with the following loop:
(for/async ([brd (in-permutations (range N))])
(unless (check-diagonals brd)
(displayln brd)))
However, making that change does not improve performance at all; in fact, it significantly reduces it. Merely running this version with N=9 takes ~6.5 seconds; with N=10, it takes ~55.
This is, however, not too surprising. Running the code with the futures visualizer (using N=7) indicates that displayln is not legal from within a future, preventing any parallel execution from ever actually taking place. Presumably, we can fix this by creating futures that just compute the results, then print them serially:
(define results-futures
(for/list ([brd (in-permutations (range N))])
(future (λ () (cons (check-diagonals brd) brd)))))
(for ([result-future (in-list results-futures)])
(let ([result (touch result-future)])
(unless (car result)
(displayln (cdr result)))))
With this change, we get a small speedup over the naïve attempt with for/async, but we’re still far slower than the sequential version. Now, with N=9, it takes ~4.58 seconds, and with N=10, it takes ~44.
Taking a look at the futures visualizer for this program (again, with N=7), there are now no blocks, but there are some syncs (on jit_on_demand and allocate memory). However, after some time spent jitting, execution seems to get going, and it actually runs a lot of futures in parallel!
After a little bit of that, however, the parallel execution seems to run out of steam, and things seem to start running relatively sequentially again.
I wasn’t really sure what was going on here, but I thought maybe it was because some of the futures are just too small. It seems possible that the overhead of scheduling thousands of futures (or, in the case of N=10, millions) was far outweighing the actual runtime of the work being done in the futures. Fortunately, this seems like something that could be solved by just grouping the work into chunks, something that’s relatively doable using in-slice:
(define results-futures
(for/list ([brds (in-slice 10000 (in-permutations (range N)))])
(future
(λ ()
(for/list ([brd (in-list brds)])
(cons (check-diagonals brd) brd))))))
(for* ([results-future (in-list results-futures)]
[result (in-list (touch results-future))])
(unless (car result)
(displayln (cdr result))))
It seems that my suspicion was correct, because that change helps a whole lot. Now, running the parallel version of the program only takes ~3.9 seconds for N=10, a speedup of more than a factor of 10 over the previous version using futures. However, unfortunately, that’s still slower than the completely sequential version, which only takes ~1.4 seconds. Increasing N to 11 makes the parallel version take ~44 seconds, and if the chunk size provided to in-slice is increased to 100,000, it takes even longer, ~55 seconds.
Taking a look at the future visualizer for that version of the program, with N=11 and a chunk size of 1,000,000, I see that there seem to be some periods of extended parallelism, but the futures get blocked a lot on memory allocation:
This makes sense, since now each future is handling much more work, but it means the futures are synchronizing constantly, likely leading to the significant performance overhead I’m seeing.
At this point, I’m not sure there’s much else I know how to tweak to improve future performance. I tried cutting down allocation by using vectors instead of lists and specialized fixnum operations, but for whatever reason, that seemed to completely destroy parallelism. I thought that maybe that was because the futures were never starting up in parallel, so I replaced future with would-be-future, but the results were confusing to me, and I didn’t really understand what they meant.
My conclusion is that Racket’s futures are simply too fragile to work with this problem, simple as it may be. I give up.
Now, as a little bonus, I decided to try and emulate the same thing in Haskell, since Haskell has a particularly robust story for fine-grained parallel evaluation. If I couldn’t get a performance boost in Haskell, I wouldn’t expect to be able to get one in Racket, either.
I’ll skip all the details about the different things I tried, but eventually, I ended up with the following program:
import Data.List (permutations)
import Data.Maybe (catMaybes)
checkDiagonals :: [Int] -> Bool
checkDiagonals bd =
or $ flip map [0 .. length bd - 1] $ \r1 ->
or $ flip map [r1 + 1 .. length bd - 1] $ \r2 ->
abs (r1 - r2) == abs ((bd !! r1) - (bd !! r2))
n :: Int
n = 11
main :: IO ()
main =
let results = flip map (permutations [0 .. n-1]) $ \brd ->
if checkDiagonals brd then Nothing else Just brd
in mapM_ print (catMaybes results)
I was able to easily add some parallelism using the Control.Parallel.Strategies library. I added a line to the main function that introduced some parallel evaluation:
import Control.Parallel.Strategies
import Data.List.Split (chunksOf)
main :: IO ()
main =
let results =
concat . withStrategy (parBuffer 10 rdeepseq) . chunksOf 100 $
flip map (permutations [0 .. n-1]) $ \brd ->
if checkDiagonals brd then Nothing else Just brd
in mapM_ print (catMaybes results)
It took some time to figure out the right chunk and rolling buffer sizes, but these values gave me a consistent 30-40% speedup over the original, sequential program.
Now, obviously, Haskell’s runtime is considerably more suited for parallel programming than Racket’s, so this comparison is hardly fair. But it helped me to see for myself that, despite having 4 cores (8 with hyperthreading) at my disposal, I wasn’t able to get even a 50% speedup. Keep that in mind.
As Matthias noted in a mailing list thread I wrote on this subject:
A word of caution on parallelism in general. Not too long ago, someone in CS at an Ivy League school studied the use of parallelism across different uses and applications. The goal was to find out how much the ‘hype’ about parallelism affected people. My recollection is that they found close to 20 projects where professors (in CE, EE, CS, Bio, Econ, etc) told their grad students/PhDs to use parallelism to run programs faster. They checked all of them and for N - 1 or 2, the projects ran faster once the parallelism was removed. Significantly faster.
People routinely underestimate the communication cost that parallelism introduces.
Be careful not to make the same mistake.
Related
I'm new to clojure, and as quick practice I wrote a function that is supposed to go through the Fibonacci sequence until it exceeds 999999999 1 billion times (does some extra math too but not very important). I've written something that does the same in Java, and while I understand that by nature Clojure is slower than Java, the java program took 35 seconds to complete while the Clojure one took 27 minutes, which I found very surprising (considering nodejs was able to complete it in about 8 minutes). I compiled the class with the repl and ran it with this Java command java -cp `clj -Spath` fib. Really unsure was to why this was so slow.
(defn fib
[]
(def iter (atom (long 0)))
(def tester (atom (long 0)))
(dotimes [n 1000000000]
(loop [prev (long 0)
curr (long 1)]
(when (<= prev 999999999)
(swap! iter inc)
(if (even? #iter)
(swap! tester + prev)
(swap! tester - prev))
(recur curr (+ prev curr)))))
(println (str "Done in: " #iter " Test: " #tester))
)
Here is my Java method for reference
public static void main(String[] args) {
long iteration = 0;
int test = 0;
for (int n = 0; n < 1000000000; n++) {
int x = 0, y = 1;
while (true) {
iteration += 1;
if (iteration % 2 == 0) {
test += x;
}
else {
test -=x;
}
int i = x + y;
x = y;
y = i;
if (x > 999999999) { break; }
}
}
System.out.println("iter: " + iteration + " " + test);
}
One thing a lot of newcomers to Clojure don't realize is that Clojure is a higher-level language by default. That means it will force you into implementations that will handle overflow on arithmetic, will treat numbers as objects you can extend, will prevent you from mutating any variable, will force you to have thread-safe code, and will push you towards functional solutions that rely on recursion for looping.
It also doesn't force you to type everything by default, which is also convenient not to have to care to think about the type of everything and making sure all your types are compatible, like that your vector contains only Integers for example, Clojure doesn't care, letting you put Integers and Longs in it.
All this stuff is great for writing fast-enough correct, evolvable, and maintainable applications, but it is not so great for high-performance algorithms.
That means by default Clojure is optimized for implementing applications and not for implementing high-performance algorithms.
Unfortunately, it seems most people that "try" a new language, and thus newcomers to Clojure will tend to first use the language to try and implement high-performance algorithms. This is an obvious mismatch in what Clojure defaults to be good at, and lots of newcomers are immediately faced with the added friction Clojure causes here. Clojure assumed you were going to implement an app, not some high-performance one billion N sized Fibonacci-like algorithm.
But don't lose hope, Clojure can also be used to implement high-performance algorithms, but it isn't the default, so you generally need to be a more experienced Clojure developer to know how to do so, as it is less obvious.
Here's your algorithm in Clojure, which performs as fast as your Java implementation, it's a recursive re-write of your exact Java code:
(ns playground
(:gen-class)
(:require [fastmath.core :as fm]))
(defn -main []
(loop [n (long 0) iteration (long 0) test (long 0)]
(if (fm/< n 1000000000)
(let [^longs inner
(loop [x (long 0) y (long 1) iteration iteration test test]
(let [iteration (fm/inc iteration)
test (if (fm/== (fm/mod iteration 2) 0) (fm/+ test x) (fm/- test x))
i (fm/+ x y)
x y
y i]
(if (fm/> x 999999999)
(doto (long-array 2) (aset 0 iteration) (aset 1 test))
(recur x y iteration test))))]
(recur (fm/inc n) (aget inner 0) (aget inner 1)))
(str "iter: " iteration " " test))))
(println (time (-main)))
"Elapsed time: 47370.544514 msecs"
;;=> iter: 45000000000 0
Using deps:
:deps {generateme/fastmath {:mvn/version "2.1.8"}}
As you can see, on my laptop, it completes in ~47 seconds. I also ran your Java version on my laptop to compare on my exact hardware, and for Java I got: 46947.343671 ms.
So on my laptop, you can see the Clojure and the Java are basically just as fast each, both clocking in at around 47 seconds.
The difference is that in Java, the style of programming is always conductive to implementing high-performance algorithms. You can directly use primitive types and primitive arithmetic, no boxing, no overflow checks, mutable variables with no synchronization or atomicity or volatility protections, etc.
Few things were thus required to get similar performance in Clojure:
Use primitive types
Use primitive math
Avoid the use of higher-level managed mutable containers like atom
And obviously, we needed to run the same algorithm too, so similar implementation. I wasn't trying to compare if another algorithm exists that can be faster for the same problem, but how to implement the same algo in Clojure so it runs just as fast.
In order to do primitive types in Clojure, you have to know that you are only allowed to do so inside local contexts using let and loop, and all function call will undo the primitive type, unless they too are typed to primitive long or double (the only supported primitive types that can cross function boundaries in Clojure).
That's the first thing I did then, just re-write your same loops using Clojure's loop/recur and declare the same variables as you did, but using let shadowing instead, so we don't need a managed mutable container.
Finally, I made use of Fastmath, a library that provides a lot of primitive versions of arithmetic functions so that we can do primitive math. Clojure core has some of its own, but it doesn't have mod for example, so I needed to pull in Fastmath.
That's it.
Generally, this is what you need to know, keep to primitive types, keep to primitive math (using fastmath), type hint to avoid reflection, leverage let shadowing, keep to primitive arrays, and you'll get Clojure high-performance implementations.
There's a good set of info about it here: https://clojure.org/reference/java_interop#primitives
One last thing, the philosophy of Clojure is that it is meant to implement fast-enough correct, evolvable and maintainable apps that can scale. That's why the language is the way it is. While you can, as I've shown, implement high-performance algos, Clojure's philosophy is also not to re-invent a syntax for things that Java already is great at. Clojure can use Java, so for algorithms that need very imperative, mutable, primitive logic, it would expect you'd just fallback to Java to write this as a static method, and then just use it from Clojure. Or it thinks you'll even delegate to something more performant than even Java, and use BLAS, or a GPU to perform super-fast matrix math, or something of that sort. That's why it doesn't bother to provide its own imperative constructs, or raw memory access and all that, since it doesn't think it do anything better than the hosts it runs over.
Your code might seem like a "basic function", but there are two main problems:
You used atom. Atom isn't variable as you know it from Java, but it's construct for managing synchronous state, free of race conditions. So reset! and swap! are atomic operations and they're slow. Look at this example:
(let [counter (atom 0)]
(dotimes [x 1000]
(-> (Thread. (fn [] (swap! counter inc)))
.start))
(Thread/sleep 2000)
#counter)
=> 1000
1000 threads is started, value of counter is 1000x increased, result is 1000, no surprise. But compare that with volatile!, which isn't thread-safe:
(let [counter (volatile! 0)]
(dotimes [x 1000]
(-> (Thread. (fn [] (vswap! counter inc)))
.start))
(Thread/sleep 2000)
#counter)
=> 989
See also Clojure Reference about Atoms.
Unless you really need atoms and volatiles, you shouldn't use them. Usage of loop is also discouraged, because there is usually some better function, which does exactly what you want. You tried to literally rewrite your Java function into Clojure. Clojure requires different approach to problems and your code definitelly isn't idiomatic. I suggest you to not rewrite Java code to Clojure line by line, but find some easy problems and learn how to solve them in Clojure way, without atom, volatile! and loop.
By the way, there is memoize, which can be useful in examples like yours.
If you are a beginner at programming, I suggest you always assume your code is wrong before assuming the language/lib/framework/platform is wrong.
Take a look at Fibonacci sequence various implementations in Java and Clojure, you may learn something.
As others have noted, a straightforward translation of the Java code to Clojure runs rather slowly. However, if we write a Fibonacci number generator which takes advantage of Clojure's strengths we can get something which is short and does its job more idiomatically.
To start, let's say we want a function which will computed the n'th number of the Fibonacci sequence (1, 1, 2, 3, 5, 8, 13, 21, 34, 55, ...). To do that we could use:
(defn fib [n]
(loop [a 1
b 0
cnt n]
(if (= cnt 1)
a
(recur (+' a b) a (dec cnt)))))
which iteratively recomputes the "next" Fibonacci value until it gets to the one which is desired.
Given this function we can develop one which creates a collection of the Fibonacci sequence values by mapping this function across a range of index values:
(defn fib-seq [n]
(map #(fib %) (range 1 (inc n))))
But this is of course a stunningly inefficient way of computing a sequence of Fibonacci values, since for each value we have to compute all of the preceding values and then we only save the last one. If we want a more efficient way to compute the entire sequence we can loop through the possibilities and gather the results in a collection:
(defn fib-seq [n]
(loop [curr 1
prev 0
c '(1)]
(if (= n (count c))
(reverse c)
(let [new-curr (+' curr prev)]
(recur new-curr curr (cons new-curr c))))))
This gives us a reasonably efficient way to collect the values of the Fibonacci sequence. For your test of a billion loops through (fib 45) (the 45th term of the sequence being the first one which exceeds 999,999,999) I used:
(time (dotimes [n 1000000000](fib-seq 45)))
which completed in 17.5 seconds on my hardware and OS (Windows 10, dual-processor Intel i5 # 2.6 GHz).
What is the purpose of the clojure reduced function (added in clojure 1.5, https://clojure.github.io/clojure/clojure.core-api.html#clojure.core/reduced)
I can't find any examples for it. The doc says:
Wraps x in a way such that a reduce will terminate with the value x.
There is also a reduced? which is acquainted to it
Returns true if x is the result of a call to reduced
When I try it out, e.g with (reduce + (reduced 100)), I get an error instead of 100. Also why would I reduce something when I know the result in advance? Since it was added there is likely a reason, but googling for clojure reduced only contains reduce results.
reduced allows you to short circuit a reduction:
(reduce (fn [acc x]
(if (> acc 10)
(reduced acc)
(+ acc x)))
0
(range 100))
;= 15
(NB. the edge case with (reduced 0) passed in as the initial value doesn't work as of Clojure 1.6.)
This is useful, because reduce-based looping is both very elegant and very performant (so much so that reduce-based loops are not infrequently more performant than the "natural" replacements based on loop/recur), so it's good to make this pattern as broadly applicable as possible. The ability to short circuit reduce vastly increases the range of possible applications.
As for reduced?, I find it useful primarily when implementing reduce logic for new data structures; in regular code, I let reduce perform its own reduced? checks where appropriate.
I have this function that reproduces my problem:
(defn my-problem
[preprocess count print-freq]
(doseq [x (preprocess (range 0 count))]
(when (= 0 (mod x print-freq))
(println x))))
Everything works fine when I call it with identity function like this :
(my-problem identity 10000000 200000)
;it prints 200000,400000 ... 9800000 just as it should
When I call it with seque function I get OutOfMemoryError :
(my-problem #(seque 5 %) 10000000 200000)
;it prints numbers up to 2000000 and then it throws OutOfMemoryException
My understanding is that seque function should just split the processing into two threads using ConcurrentBlockingQueue with max size 5 (in this case). I don't understand where the memory leak is.
The way seque is implemented, if you consume elements much more quickly than you can produce them, a large number of agent tasks will pile up in the queue used internally by seque (up to one task per element in the sequence). In theory what you're doing should be fine, but in practice it doesn't really work out. You should be able to see the same effect just by running (dorun (seque (range))).
You can also use the function sequeue in flatland/useful, which makes tradeoffs that are different from the ones in clojure.core. Read the docstring carefully, but I think it would work well for your situation.
Reduce and reductions let you accumulate state over a sequence.
Each element in the sequence will modify the accumulated state until
the end of the sequence is reached.
What are implications of calling reduce or reductions on an infinite list?
(def c (cycle [0]))
(reduce + c)
This will quickly throw an OutOfMemoryError. By the way, (reduce + (cycle [0])) does not throw an OutOfMemoryError (at least not for the time I waited). It never returns. Not sure why.
Is there any way to call reduce or reductions on an infinite list in a way that makes sense? The problem I see in the above example, is that eventually the evaluated part of the list becomes large enough to overflow the heap. Maybe an infinite list is not the right paradigm. Reducing over a generator, IO stream, or an event stream would make more sense. The value should not be kept after it's evaluated and used to modify the state.
It will never return because reduce takes a sequence and a function and applies the function until the input sequence is empty, only then can it know it has the final value.
Reduce on a truly infinite seq would not make a lot of sense unless it is producing a side effect like logging its progress.
In your first example you are first creating a var referencing an infinite sequence.
(def c (cycle [0]))
Then you are passing the contents of the var c to reduce which starts reading elements to update its state.
(reduce + c)
These elements can't be garbage collected because the var c holds a reference to the first of them, which in turn holds a reference to the second and so on. Eventually it reads as many as there is space in the heap and then OOM.
To keep from blowing the heap in your second example you are not keeping a reference to the data you have already used so the items on the seq returned by cycle are GCd as fast as they are produced and the accumulated result continues to get bigger. Eventually it would overflow a long and crash (clojure 1.3) or promote itself to a BigInteger and grow to the size of all the heap (clojure 1.2)
(reduce + (cycle [0]))
Arthur's answer is good as far as it goes, but it looks like he doesn't address your second question about reductions. reductions returns a lazy sequence of intermediate stages of what reduce would have returned if given a list only N elements long. So it's perfectly sensible to call reductions on an infinite list:
user=> (take 10 (reductions + (range)))
(0 1 3 6 10 15 21 28 36 45)
If you want to keep getting items from a list like an IO stream and keep state between runs, you cannot use doseq (without resorting to def's). Instead a good approach would be to use loop/recur this will allow you to avoid consuming too much stack space and will let you keep state, in your case:
(loop [c (cycle [0])]
(if (evaluate-some-condition (first c))
(do-something-with (first c) (recur (rest c)))
nil))
Of course compared to your case there is here a condition check to make sure we don't loop indefinitely.
As others have pointed out, it doesn't make sense to run reduce directly on an infinite sequence, since reduce is non-lazy and needs to consume the full sequence.
As an alternative for this kind of situation, here's a helpful function that reduces only the first n items in a sequence, implemented using recur for reasonable efficiency:
(defn counted-reduce
([n f s]
(counted-reduce (dec n) f (first s) (rest s) ))
([n f initial s]
(if (<= n 0)
initial
(recur (dec n) f (f initial (first s)) (rest s)))))
(counted-reduce 10000000 + (range))
=> 49999995000000
Short version:
I am interested in some Clojure code which will allow me to specify the transformations of x (e.g. permutations, rotations) under which the value of a function f(x) is invariant, so that I can efficiently generate a sequence of x's that satisfy r = f(x). Is there some development in computer algebra for Clojure?
For (a trivial) example
(defn #^{:domain #{3 4 7}
:range #{0,1,2}
:invariance-group :full}
f [x] (- x x))
I could call (preimage f #{0}) and it would efficiently return #{3 4 7}. Naturally, it would also be able to annotate the codomain correctly. Any suggestions?
Longer version:
I have a specific problem that makes me interested in finding out about development of computer algebra for Clojure. Can anyone point me to such a project? My specific problem involves finding all the combinations of words that satisfy F(x) = r, where F is a ranking function and r a positive integer. In my particular case f can be computed as a sum
F(x) = f(x[0]) + f(x[1]) + ... f(x[N-1])
Furthermore I have a set of disjoint sets S = {s_i}, such that f(a)=f(b) for a,b in s, s in S. So a strategy to generate all x such that F(x) = r should rely on this factorization of F and the invariance of f under each s_i. In words, I compute all permutations of sites containing elements of S that sum to r and compose them with all combinations of the elements in each s_i. This is done quite sloppily in the following:
(use 'clojure.contrib.combinatorics)
(use 'clojure.contrib.seq-utils)
(defn expand-counter [c]
(flatten (for [m c] (let [x (m 0) y (m 1)] (repeat y x)))))
(defn partition-by-rank-sum [A N f r]
(let [M (group-by f A)
image-A (set (keys M))
;integer-partition computes restricted integer partitions,
;returning a multiset as key value pairs
rank-partitions (integer-partition r (disj image-A 0))
]
(apply concat (for [part rank-partitions]
(let [k (- N (reduce + (vals part)))
rank-map (if (pos? k) (assoc part 0 k) part)
all-buckets (lex-permutations (expand-counter rank-map))
]
(apply concat (for [bucket all-buckets]
(let [val-bucket (map M bucket)
filled-buckets (apply cartesian-product val-bucket)]
(map vec filled-buckets)))))))))
This gets the job done but misses the underlying picture. For example, if the associative operation were a product instead of a sum I would have to rewrite portions.
The system below does not yet support combinatorics, though it would not be a huge effort to add them, loads of good code already exists, and this could be a good platform to graft it onto, since the basics are pretty sound. I hope a short plug is not inappropriate here, this is the only serious Clojure CAS I know of, but hey, what a system...
=======
It may be of interest to readers of this thread that Gerry Sussman's scmutils system is being ported to Clojure.
This is a very advanced CAS, offering things like automatic differentiation, literal functions, etc, much in the style of Maple.
It is used at MIT for advanced programs on dynamics and differential geometry, and a fair bit of electrical engineering stuff. It is also the system used in Sussman&Wisdom's "sequel" (LOL) to SICP, SICM (Structure and Interpretation of Classical Mechanics).
Although originally a Scheme program, this is not a direct translation, but a ground-up rewrite to take advantage of the best features of Clojure. It's been named sicmutils, both in honour of the original and of the book
This superb effort is the work of Colin Smith and you can find it at https://github.com/littleredcomputer/sicmutils .
I believe that this could form the basis of an amazing Computer Algebra System for Clojure, competitive with anything else available. Although it is quite a huge beast, as you can imagine, and tons of stuff remains to be ported, the basics are pretty much there, the system will differentiate, and handle literals and literal functions pretty well. It is a work in progress. The system also uses the "generic" approach advocated by Sussman, whereby operations can be applied to functions, creating a great abstraction that simplifies notation no end.
Here's a taster:
> (def unity (+ (square sin) (square cos)))
> (unity 2.0) ==> 1.0
> (unity 'x) ==> 1 ;; yes we can deal with symbols
> (def zero (D unity)) ;; Let's differentiate
> (zero 2.0) ==> 0
SicmUtils introduces two new vector types “up” and “down” (called “structures”), they work pretty much as you would expect vectors to, but have some special mathematical (covariant, contravariant) properties, and also some programming properties, in that they are executable!
> (def fnvec (up sin cos tan)) => fnvec
> (fnvec 1) ==> (up 0.8414709848078965 0.5403023058681398 1.5574077246549023)
> ;; differentiated
> ((D fnvec) 1) ==> (up 0.5403023058681398 -0.8414709848078965 3.425518820814759)
> ;; derivative with symbolic argument
> ((D fnvec) 'θ) ==> (up (cos θ) (* -1 (sin θ)) (/ 1 (expt (cos θ) 2)))
Partial differentiation is fully supported
> (defn ff [x y] (* (expt x 3)(expt y 5)))
> ((D ff) 'x 'y) ==> (down (* 3 (expt x 2) (expt y 5)) (* 5 (expt x 3) (expt y 4)))
> ;; i.e. vector of results wrt to both variables
The system also supports TeX output, polynomial factorization, and a host of other goodies. Lots of stuff, however, that could be easily implemented has not been done purely out of lack of human resources. Graphic output and a "notepad/worksheet" interface (using Clojure's Gorilla) are also being worked on.
I hope this has gone some way towards whetting your appetite enough to visit the site and give it a whirl. You don't even need Clojure, you could run it off the provided jar file.
There's Clojuratica, an interface between Clojure and Mathematica:
http://clojuratica.weebly.com/
See also this mailing list post by Clojuratica's author.
While not a CAS, Incanter also has several very nice features and might be a good reference/foundation to build your own ideas on.
Regarding "For example, if the associative operation were a product instead of a sum I would have to rewrite portions.": if you structure your code accordingly, couldn't you accomplish this by using higher-order functions and passing in the associative operation? Think map-reduce.
I am unaware of any computer algebra systems written in Clojure. However, for my rather simple mathematical needs I have found it often useful to use Maxima, which is written in lisp. It is possible to interact with Maxima using s-expressions or a higher-level representations, which can be really convenient. Maxima also has some rudimentary combinatorics functions which may be what you are looking for.
If you are hellbent on using Clojure, in the short term perhaps throwing your data back and forth between Maxima and Clojure would help you achieve your goals.
In the longer term, I would be interested in seeing what you come up with!