Related
I'm new to clojure, and as quick practice I wrote a function that is supposed to go through the Fibonacci sequence until it exceeds 999999999 1 billion times (does some extra math too but not very important). I've written something that does the same in Java, and while I understand that by nature Clojure is slower than Java, the java program took 35 seconds to complete while the Clojure one took 27 minutes, which I found very surprising (considering nodejs was able to complete it in about 8 minutes). I compiled the class with the repl and ran it with this Java command java -cp `clj -Spath` fib. Really unsure was to why this was so slow.
(defn fib
[]
(def iter (atom (long 0)))
(def tester (atom (long 0)))
(dotimes [n 1000000000]
(loop [prev (long 0)
curr (long 1)]
(when (<= prev 999999999)
(swap! iter inc)
(if (even? #iter)
(swap! tester + prev)
(swap! tester - prev))
(recur curr (+ prev curr)))))
(println (str "Done in: " #iter " Test: " #tester))
)
Here is my Java method for reference
public static void main(String[] args) {
long iteration = 0;
int test = 0;
for (int n = 0; n < 1000000000; n++) {
int x = 0, y = 1;
while (true) {
iteration += 1;
if (iteration % 2 == 0) {
test += x;
}
else {
test -=x;
}
int i = x + y;
x = y;
y = i;
if (x > 999999999) { break; }
}
}
System.out.println("iter: " + iteration + " " + test);
}
One thing a lot of newcomers to Clojure don't realize is that Clojure is a higher-level language by default. That means it will force you into implementations that will handle overflow on arithmetic, will treat numbers as objects you can extend, will prevent you from mutating any variable, will force you to have thread-safe code, and will push you towards functional solutions that rely on recursion for looping.
It also doesn't force you to type everything by default, which is also convenient not to have to care to think about the type of everything and making sure all your types are compatible, like that your vector contains only Integers for example, Clojure doesn't care, letting you put Integers and Longs in it.
All this stuff is great for writing fast-enough correct, evolvable, and maintainable applications, but it is not so great for high-performance algorithms.
That means by default Clojure is optimized for implementing applications and not for implementing high-performance algorithms.
Unfortunately, it seems most people that "try" a new language, and thus newcomers to Clojure will tend to first use the language to try and implement high-performance algorithms. This is an obvious mismatch in what Clojure defaults to be good at, and lots of newcomers are immediately faced with the added friction Clojure causes here. Clojure assumed you were going to implement an app, not some high-performance one billion N sized Fibonacci-like algorithm.
But don't lose hope, Clojure can also be used to implement high-performance algorithms, but it isn't the default, so you generally need to be a more experienced Clojure developer to know how to do so, as it is less obvious.
Here's your algorithm in Clojure, which performs as fast as your Java implementation, it's a recursive re-write of your exact Java code:
(ns playground
(:gen-class)
(:require [fastmath.core :as fm]))
(defn -main []
(loop [n (long 0) iteration (long 0) test (long 0)]
(if (fm/< n 1000000000)
(let [^longs inner
(loop [x (long 0) y (long 1) iteration iteration test test]
(let [iteration (fm/inc iteration)
test (if (fm/== (fm/mod iteration 2) 0) (fm/+ test x) (fm/- test x))
i (fm/+ x y)
x y
y i]
(if (fm/> x 999999999)
(doto (long-array 2) (aset 0 iteration) (aset 1 test))
(recur x y iteration test))))]
(recur (fm/inc n) (aget inner 0) (aget inner 1)))
(str "iter: " iteration " " test))))
(println (time (-main)))
"Elapsed time: 47370.544514 msecs"
;;=> iter: 45000000000 0
Using deps:
:deps {generateme/fastmath {:mvn/version "2.1.8"}}
As you can see, on my laptop, it completes in ~47 seconds. I also ran your Java version on my laptop to compare on my exact hardware, and for Java I got: 46947.343671 ms.
So on my laptop, you can see the Clojure and the Java are basically just as fast each, both clocking in at around 47 seconds.
The difference is that in Java, the style of programming is always conductive to implementing high-performance algorithms. You can directly use primitive types and primitive arithmetic, no boxing, no overflow checks, mutable variables with no synchronization or atomicity or volatility protections, etc.
Few things were thus required to get similar performance in Clojure:
Use primitive types
Use primitive math
Avoid the use of higher-level managed mutable containers like atom
And obviously, we needed to run the same algorithm too, so similar implementation. I wasn't trying to compare if another algorithm exists that can be faster for the same problem, but how to implement the same algo in Clojure so it runs just as fast.
In order to do primitive types in Clojure, you have to know that you are only allowed to do so inside local contexts using let and loop, and all function call will undo the primitive type, unless they too are typed to primitive long or double (the only supported primitive types that can cross function boundaries in Clojure).
That's the first thing I did then, just re-write your same loops using Clojure's loop/recur and declare the same variables as you did, but using let shadowing instead, so we don't need a managed mutable container.
Finally, I made use of Fastmath, a library that provides a lot of primitive versions of arithmetic functions so that we can do primitive math. Clojure core has some of its own, but it doesn't have mod for example, so I needed to pull in Fastmath.
That's it.
Generally, this is what you need to know, keep to primitive types, keep to primitive math (using fastmath), type hint to avoid reflection, leverage let shadowing, keep to primitive arrays, and you'll get Clojure high-performance implementations.
There's a good set of info about it here: https://clojure.org/reference/java_interop#primitives
One last thing, the philosophy of Clojure is that it is meant to implement fast-enough correct, evolvable and maintainable apps that can scale. That's why the language is the way it is. While you can, as I've shown, implement high-performance algos, Clojure's philosophy is also not to re-invent a syntax for things that Java already is great at. Clojure can use Java, so for algorithms that need very imperative, mutable, primitive logic, it would expect you'd just fallback to Java to write this as a static method, and then just use it from Clojure. Or it thinks you'll even delegate to something more performant than even Java, and use BLAS, or a GPU to perform super-fast matrix math, or something of that sort. That's why it doesn't bother to provide its own imperative constructs, or raw memory access and all that, since it doesn't think it do anything better than the hosts it runs over.
Your code might seem like a "basic function", but there are two main problems:
You used atom. Atom isn't variable as you know it from Java, but it's construct for managing synchronous state, free of race conditions. So reset! and swap! are atomic operations and they're slow. Look at this example:
(let [counter (atom 0)]
(dotimes [x 1000]
(-> (Thread. (fn [] (swap! counter inc)))
.start))
(Thread/sleep 2000)
#counter)
=> 1000
1000 threads is started, value of counter is 1000x increased, result is 1000, no surprise. But compare that with volatile!, which isn't thread-safe:
(let [counter (volatile! 0)]
(dotimes [x 1000]
(-> (Thread. (fn [] (vswap! counter inc)))
.start))
(Thread/sleep 2000)
#counter)
=> 989
See also Clojure Reference about Atoms.
Unless you really need atoms and volatiles, you shouldn't use them. Usage of loop is also discouraged, because there is usually some better function, which does exactly what you want. You tried to literally rewrite your Java function into Clojure. Clojure requires different approach to problems and your code definitelly isn't idiomatic. I suggest you to not rewrite Java code to Clojure line by line, but find some easy problems and learn how to solve them in Clojure way, without atom, volatile! and loop.
By the way, there is memoize, which can be useful in examples like yours.
If you are a beginner at programming, I suggest you always assume your code is wrong before assuming the language/lib/framework/platform is wrong.
Take a look at Fibonacci sequence various implementations in Java and Clojure, you may learn something.
As others have noted, a straightforward translation of the Java code to Clojure runs rather slowly. However, if we write a Fibonacci number generator which takes advantage of Clojure's strengths we can get something which is short and does its job more idiomatically.
To start, let's say we want a function which will computed the n'th number of the Fibonacci sequence (1, 1, 2, 3, 5, 8, 13, 21, 34, 55, ...). To do that we could use:
(defn fib [n]
(loop [a 1
b 0
cnt n]
(if (= cnt 1)
a
(recur (+' a b) a (dec cnt)))))
which iteratively recomputes the "next" Fibonacci value until it gets to the one which is desired.
Given this function we can develop one which creates a collection of the Fibonacci sequence values by mapping this function across a range of index values:
(defn fib-seq [n]
(map #(fib %) (range 1 (inc n))))
But this is of course a stunningly inefficient way of computing a sequence of Fibonacci values, since for each value we have to compute all of the preceding values and then we only save the last one. If we want a more efficient way to compute the entire sequence we can loop through the possibilities and gather the results in a collection:
(defn fib-seq [n]
(loop [curr 1
prev 0
c '(1)]
(if (= n (count c))
(reverse c)
(let [new-curr (+' curr prev)]
(recur new-curr curr (cons new-curr c))))))
This gives us a reasonably efficient way to collect the values of the Fibonacci sequence. For your test of a billion loops through (fib 45) (the 45th term of the sequence being the first one which exceeds 999,999,999) I used:
(time (dotimes [n 1000000000](fib-seq 45)))
which completed in 17.5 seconds on my hardware and OS (Windows 10, dual-processor Intel i5 # 2.6 GHz).
I am using following simple code to solve n-queens problem:
#lang racket
; following returns true if queens are on diagonals:
(define (check-diagonals bd)
(for/or ((r1 (length bd)))
(for/or ((r2 (range (add1 r1) (length bd))))
(= (abs (- r1 r2))
(abs(- (list-ref bd r1)
(list-ref bd r2)))))))
; set board size:
(define N 8)
; start searching:
(for ((brd (in-permutations (range N))))
(when (not (check-diagonals brd))
(displayln brd)))
It is working fine but takes a long time for larger values of N. It uses in-permutations function to get a stream of permutations. I also see that it uses only 25% of cpu power (1 of 4 cores is being used). How can I modify this code so that it uses parallel testing of permutations from in-permutations stream and uses all 4 cpu cores? Thanks for your help.
Edit: Modified check-diagonals function as suggested in the comments. Older code was:
(define (check-diagonals bd)
(define res #f)
(for ((r1 (length bd))
#:break res)
(for ((r2 (range (add1 r1) (length bd)))
#:break res)
(when (= (abs (- r1 r2))
(abs(- (list-ref bd r1)
(list-ref bd r2))))
(set! res #t))))
res)
To start, before even parallelizing anything, you can improve your original program quite a bit. Here are some changes you can make:
Use for/or instead of mutating a binding and using #:break.
Use for*/or instead of nesting for/or loops inside of each other.
Use in-range where possible to ensure for loops specialize to simple loops upon expansion.
Use square brackets instead of parentheses in the relevant places to make your code more readable to Racketeers (especially since this code isn’t remotely portable Scheme, anyway).
With those changes, the code looks like this:
#lang racket
; following returns true if queens are on diagonals:
(define (check-diagonals bd)
(for*/or ([r1 (in-range (length bd))]
[r2 (in-range (add1 r1) (length bd))])
(= (abs (- r1 r2))
(abs (- (list-ref bd r1)
(list-ref bd r2))))))
; set board size:
(define N 8)
; start searching:
(for ([brd (in-permutations (range N))])
(unless (check-diagonals brd)
(displayln brd)))
Now we can turn to parallelizing. Here’s the thing: parallelizing things tends to be tricky. Scheduling work in parallel tends to have overhead, and that overhead can outweigh the benefits more often than you might think. I spent a long time trying to parallelize this code, and ultimately, I was not able to produce a program that ran faster than the original, sequential program.
Still, you are likely interested what I did. Maybe you (or someone else) will be able to come up with something better than I can. The relevant tool for the job here is Racket’s futures, designed for fine-grained parallelism. Futures are fairly restrictive due to the way Racket’s runtime is designed (which is essentially the way it is for historical reasons), so not just anything can be parallelized, and a fairly large number of operations can cause futures to block. Fortunately, Racket also ships with the Future Visualizer, a graphical tool for understanding the runtime behavior of futures.
Before we can begin, I ran the sequential version of the program with N=11 and recorded the results.
$ time racket n-queens.rkt
[program output elided]
14.44 real 13.92 user 0.32 sys
I will use these numbers as a point of comparison for the remainder of this answer.
To start, we can try replacing the primary for loop with for/asnyc, which runs all its bodies in parallel. This is an extremely simple transformation, and it leaves us with the following loop:
(for/async ([brd (in-permutations (range N))])
(unless (check-diagonals brd)
(displayln brd)))
However, making that change does not improve performance at all; in fact, it significantly reduces it. Merely running this version with N=9 takes ~6.5 seconds; with N=10, it takes ~55.
This is, however, not too surprising. Running the code with the futures visualizer (using N=7) indicates that displayln is not legal from within a future, preventing any parallel execution from ever actually taking place. Presumably, we can fix this by creating futures that just compute the results, then print them serially:
(define results-futures
(for/list ([brd (in-permutations (range N))])
(future (λ () (cons (check-diagonals brd) brd)))))
(for ([result-future (in-list results-futures)])
(let ([result (touch result-future)])
(unless (car result)
(displayln (cdr result)))))
With this change, we get a small speedup over the naïve attempt with for/async, but we’re still far slower than the sequential version. Now, with N=9, it takes ~4.58 seconds, and with N=10, it takes ~44.
Taking a look at the futures visualizer for this program (again, with N=7), there are now no blocks, but there are some syncs (on jit_on_demand and allocate memory). However, after some time spent jitting, execution seems to get going, and it actually runs a lot of futures in parallel!
After a little bit of that, however, the parallel execution seems to run out of steam, and things seem to start running relatively sequentially again.
I wasn’t really sure what was going on here, but I thought maybe it was because some of the futures are just too small. It seems possible that the overhead of scheduling thousands of futures (or, in the case of N=10, millions) was far outweighing the actual runtime of the work being done in the futures. Fortunately, this seems like something that could be solved by just grouping the work into chunks, something that’s relatively doable using in-slice:
(define results-futures
(for/list ([brds (in-slice 10000 (in-permutations (range N)))])
(future
(λ ()
(for/list ([brd (in-list brds)])
(cons (check-diagonals brd) brd))))))
(for* ([results-future (in-list results-futures)]
[result (in-list (touch results-future))])
(unless (car result)
(displayln (cdr result))))
It seems that my suspicion was correct, because that change helps a whole lot. Now, running the parallel version of the program only takes ~3.9 seconds for N=10, a speedup of more than a factor of 10 over the previous version using futures. However, unfortunately, that’s still slower than the completely sequential version, which only takes ~1.4 seconds. Increasing N to 11 makes the parallel version take ~44 seconds, and if the chunk size provided to in-slice is increased to 100,000, it takes even longer, ~55 seconds.
Taking a look at the future visualizer for that version of the program, with N=11 and a chunk size of 1,000,000, I see that there seem to be some periods of extended parallelism, but the futures get blocked a lot on memory allocation:
This makes sense, since now each future is handling much more work, but it means the futures are synchronizing constantly, likely leading to the significant performance overhead I’m seeing.
At this point, I’m not sure there’s much else I know how to tweak to improve future performance. I tried cutting down allocation by using vectors instead of lists and specialized fixnum operations, but for whatever reason, that seemed to completely destroy parallelism. I thought that maybe that was because the futures were never starting up in parallel, so I replaced future with would-be-future, but the results were confusing to me, and I didn’t really understand what they meant.
My conclusion is that Racket’s futures are simply too fragile to work with this problem, simple as it may be. I give up.
Now, as a little bonus, I decided to try and emulate the same thing in Haskell, since Haskell has a particularly robust story for fine-grained parallel evaluation. If I couldn’t get a performance boost in Haskell, I wouldn’t expect to be able to get one in Racket, either.
I’ll skip all the details about the different things I tried, but eventually, I ended up with the following program:
import Data.List (permutations)
import Data.Maybe (catMaybes)
checkDiagonals :: [Int] -> Bool
checkDiagonals bd =
or $ flip map [0 .. length bd - 1] $ \r1 ->
or $ flip map [r1 + 1 .. length bd - 1] $ \r2 ->
abs (r1 - r2) == abs ((bd !! r1) - (bd !! r2))
n :: Int
n = 11
main :: IO ()
main =
let results = flip map (permutations [0 .. n-1]) $ \brd ->
if checkDiagonals brd then Nothing else Just brd
in mapM_ print (catMaybes results)
I was able to easily add some parallelism using the Control.Parallel.Strategies library. I added a line to the main function that introduced some parallel evaluation:
import Control.Parallel.Strategies
import Data.List.Split (chunksOf)
main :: IO ()
main =
let results =
concat . withStrategy (parBuffer 10 rdeepseq) . chunksOf 100 $
flip map (permutations [0 .. n-1]) $ \brd ->
if checkDiagonals brd then Nothing else Just brd
in mapM_ print (catMaybes results)
It took some time to figure out the right chunk and rolling buffer sizes, but these values gave me a consistent 30-40% speedup over the original, sequential program.
Now, obviously, Haskell’s runtime is considerably more suited for parallel programming than Racket’s, so this comparison is hardly fair. But it helped me to see for myself that, despite having 4 cores (8 with hyperthreading) at my disposal, I wasn’t able to get even a 50% speedup. Keep that in mind.
As Matthias noted in a mailing list thread I wrote on this subject:
A word of caution on parallelism in general. Not too long ago, someone in CS at an Ivy League school studied the use of parallelism across different uses and applications. The goal was to find out how much the ‘hype’ about parallelism affected people. My recollection is that they found close to 20 projects where professors (in CE, EE, CS, Bio, Econ, etc) told their grad students/PhDs to use parallelism to run programs faster. They checked all of them and for N - 1 or 2, the projects ran faster once the parallelism was removed. Significantly faster.
People routinely underestimate the communication cost that parallelism introduces.
Be careful not to make the same mistake.
I have a simple prime number calculator in clojure (an inefficient algorithm, but I'm just trying to understand the behavior of recur for now). The code is:
(defn divisible [x,y] (= 0 (mod x y)))
(defn naive-primes [primes candidates]
(if (seq candidates)
(recur (conj primes (first candidates))
(remove (fn [x] (divisible x (first candidates))) candidates))
primes)
)
This works as long as I am not trying to find too many numbers. For example
(print (sort (naive-primes [] (range 2 2000))))
works. For anything requiring more recursion, I get an overflow error.
(print (sort (naive-primes [] (range 2 20000))))
will not work. In general, whether I use recur or call naive-primes again without the attempt at TCO doesn't appear to make any difference. Why am I getting errors for large recursions while using recur?
recur always uses tail recursion, regardless of whether you are recurring to a loop or a function head. The issue is the calls to remove. remove calls first to get the element from the underlying seq and checks to see if that element is valid. If the underlying seq was created by a call to remove, you get another call to first. If you call remove 20000 times on the same seq, calling first requires calling first 20000 times, and none of the calls can be tail recursive. Hence, the stack overflow error.
Changing (remove ...) to (doall (remove ...)) fixes the problem, since it prevents the infinite stacking of remove calls (each one gets fully applied immediately and returns a concrete seq, not a lazy seq). I think this method only ever keeps one candidates list in memory at one time, though I am not positive about this. If so, it isn't too space inefficient, and a bit of testing shows that it isn't actually much slower.
What is the correct way, in Clojure, to do parallel processing when each job of the processing can occur in utter isolation and may generate a list of additional jobs that need to be evaluated?
My actual problem is a nutritional calculation problem, but I will put this in the form of Chess which shares the same problem space traits as my calculation.
Assume, for instance, that I am trying to find all of the moves to Checkmate in a game of Chess. When searching through the board states, I would start out with 20 possible states, each representing a different possible opening move. Each of those will need to be evaluated, accepted or rejected, and then for each accepted move, a new list of jobs would be created representing all of the possible next moves. The jobs would look like this:
initial: '([] proposed-move)
accepted: '([move] proposed-response)
'([move move] proposed-response)
The number of states to evaluates grows as a result of each computation, and each state can be evaluated in complete isolation from all of the others.
A solution I am playing with goes as such:
; a list of all final solutions, each of which is a sequence of moves
(def solutions (agent []))
; a list of all jobs pending evaluation
(def jobs (agent []))
Given these definitions, I would have a java thread pool, and each thread would request a job from the jobs agent (and wait for that request to be fulfilled). It would then run the calculation, generate a list of solutions and possible solutions. Finally, it would send the solutions to the solutions agent, and the possible solutions to the jobs agent.
Is using a combination of agents and threads the most idiomatic way to go in this case? Can I even get data out of the job queue in the way I am proposing?
Or should my jobs be a java.util.concurrent.LinkedBlockingQueue, as described in Producer consumer with qualifications?
You can do this with the following approach:
Repeated applications of pmap (which provides parallel processing of all elements in collection)
The function used in pmap returns a list of elements. Could be zero, one or multiple elements, which will then be processed in the next iteration
The results get recombined with concat
You repeat the processing of the list for as many times as you like, perhaps storing the result in an atom.
Example code could be something like the following
(def jobs (atom '(1 10 100)))
(defn process-element [value]
(if (< (rand) 0.8)
[(inc value)]
[]))
(defn do-processing []
(swap! jobs
(fn [job-list] (apply concat (pmap process-element job-list)))))
(while (seq #jobs)
(prn #jobs)
(do-processing))
Whick could produce output like:
(1 10 100)
(2 11 101)
(3 12 102)
(4 13 103)
(5 14 104)
(6 15 105)
(7 106)
(107)
(108)
(109)
nil
Note that you need to be a bit careful to make sure your algorithm terminates! In the example this is guaranteed by the elements dying off over time, but if your seach space is growing then you will probably want to apply a time limit instead of just using a (while ... ) loop.
Your approach with agents and threads seems quite close to (what I see as) idiomatic clojure.
the only thing I would change to make it more "clojure like" would be to use pmap to iterate over queue that is stored in an agent. using pmap instead of your own thread pool will save you the effort of managing the thread pool because pmap already uses clojure's thread pool which is initialized properly for the current number of processors. it also helps you take advantage of sequence chunking (which perhaps could help).
You could also use channels. Maybe something like this:
(def jobs (chan))
(def solutions (chan))
(def accepted-solutions (atom (vector)))
(go (loop [job (<! jobs)]
(when job
(go (doseq [solution (process-job-into-solutions job)]
(>! solutions)))
(recur (<! jobs)))))
(go (loop [solution (<! solutions)]
(when (acceptable? solution)
(swap! accepted-solutions conj solution)
(doseq [new-job (generate-new-jobs solution)]
(>! jobs))
(recur (<! solutions)))))
(>!! jobs initial-job)
Reduce and reductions let you accumulate state over a sequence.
Each element in the sequence will modify the accumulated state until
the end of the sequence is reached.
What are implications of calling reduce or reductions on an infinite list?
(def c (cycle [0]))
(reduce + c)
This will quickly throw an OutOfMemoryError. By the way, (reduce + (cycle [0])) does not throw an OutOfMemoryError (at least not for the time I waited). It never returns. Not sure why.
Is there any way to call reduce or reductions on an infinite list in a way that makes sense? The problem I see in the above example, is that eventually the evaluated part of the list becomes large enough to overflow the heap. Maybe an infinite list is not the right paradigm. Reducing over a generator, IO stream, or an event stream would make more sense. The value should not be kept after it's evaluated and used to modify the state.
It will never return because reduce takes a sequence and a function and applies the function until the input sequence is empty, only then can it know it has the final value.
Reduce on a truly infinite seq would not make a lot of sense unless it is producing a side effect like logging its progress.
In your first example you are first creating a var referencing an infinite sequence.
(def c (cycle [0]))
Then you are passing the contents of the var c to reduce which starts reading elements to update its state.
(reduce + c)
These elements can't be garbage collected because the var c holds a reference to the first of them, which in turn holds a reference to the second and so on. Eventually it reads as many as there is space in the heap and then OOM.
To keep from blowing the heap in your second example you are not keeping a reference to the data you have already used so the items on the seq returned by cycle are GCd as fast as they are produced and the accumulated result continues to get bigger. Eventually it would overflow a long and crash (clojure 1.3) or promote itself to a BigInteger and grow to the size of all the heap (clojure 1.2)
(reduce + (cycle [0]))
Arthur's answer is good as far as it goes, but it looks like he doesn't address your second question about reductions. reductions returns a lazy sequence of intermediate stages of what reduce would have returned if given a list only N elements long. So it's perfectly sensible to call reductions on an infinite list:
user=> (take 10 (reductions + (range)))
(0 1 3 6 10 15 21 28 36 45)
If you want to keep getting items from a list like an IO stream and keep state between runs, you cannot use doseq (without resorting to def's). Instead a good approach would be to use loop/recur this will allow you to avoid consuming too much stack space and will let you keep state, in your case:
(loop [c (cycle [0])]
(if (evaluate-some-condition (first c))
(do-something-with (first c) (recur (rest c)))
nil))
Of course compared to your case there is here a condition check to make sure we don't loop indefinitely.
As others have pointed out, it doesn't make sense to run reduce directly on an infinite sequence, since reduce is non-lazy and needs to consume the full sequence.
As an alternative for this kind of situation, here's a helpful function that reduces only the first n items in a sequence, implemented using recur for reasonable efficiency:
(defn counted-reduce
([n f s]
(counted-reduce (dec n) f (first s) (rest s) ))
([n f initial s]
(if (<= n 0)
initial
(recur (dec n) f (f initial (first s)) (rest s)))))
(counted-reduce 10000000 + (range))
=> 49999995000000