My application evaluates quoted expressions received from remote clients. Overtime, my system's memory increases and eventually it crashes. What I've found out is that:
When I execute the following code from Clojure's nrepl in a docker container:
(dotimes [x 1000000] ; or some arbitrary large number
(eval '(+ 1 1)))
the container's memory usage keeps rising until it hits the limit, at which point the system will crash.
How do I get around this problem?
There's another thread mentioning this behavior. One of the answers mentions the use of tools.reader, which still uses eval if I need code execution, leading to the same problem.
There's no easy way to get around this as each call to eval creates a new class, even though the form you're evaluating is exactly the same. By itself, JVM will not get rid of new classes.
There are two ways to circumvent this:
Stop using eval altogether (by e.g. creating your own DSL or your own version of eval with limited functionality) or at least use it less frequently, e.g. by batching the forms you need to evaluate
Unload already loaded classes - I haven't done it myself and it probably requires a lot of work, but you can follow answers in this topic: Unloading classes in java?
I don't know how eval exactly works internally,
but based on my observations I don't think your conclusions are correct and also Eugene's remark "By itself, JVM will not get rid of new classes" seems to be false.
I run your sample with -Xmx256m and it went fine.
(time (dotimes [x 1000000] ; or some arbitrary large number
(eval '(+ 1 1))))
;; "Elapsed time: 529079.032449 msecs"
I checked the question you linked and they say it's Metaspace that is growing not heap.
So I observed the Metaspace and it's usage is growing but also shrinking.
You can find my experiment here together with some graphs from JMC console: https://github.com/jumarko/clojure-experiments/commit/824f3a69019840940eaa88c3427515bcba33c4d2
Note: To run this experiment, I've used JDK 17.0.2 on macOS
Related
I have recently watched Rich Hickeys talk at Cojure Conj 2016 and although it was very interesting, I didn't really understand the point in clojure.spec or when you'd use it. It seemed like most of the ideas, such as conform, valid etc, had similar functions in Clojure already.
I have only been learning clojure for around 3 months now so maybe this is due to lack of programming/Clojure experience.
Do clojure.spec and cljs.spec work in similar ways to Clojure and Cljs in that, although they are not 100% the same, they are based on the same underlying principles.
Are you tired of documenting your programs?
Does the prospect of making up yet more tests cause procrastination?
When the boss says "test coverage", do you cower with fear?
Do you forget what your data names mean?
For smooth expression of hard specifications, you need Clojure.Spec!
Clojure.spec gives you a uniform method of documenting, specifying, and automatically testing your programs, and of validating your live data.
It steals virtually every one of its ideas. And it does nothing you can't do for yourself.
But in my - barely informed - opinion, it changes the economy of specification, making it worth while doing properly. A game-changer? - quite possibly.
At the clojure/conj conference last week, probably half of the presentations featured spec in some way, and it's not even out of alpha yet. spec is a major feature of clojure; it is here to stay, and it is powerful.
As an example of its power, take static type checking, hailed as a kind of safety net by so many, and a defining characteristic of so many programming languages. It is incredibly limited in that it's only good at compile time, and it only checks types. spec, on the other hand, validates and conforms any predicate (not just type) for the args, the return, and can also validate relationships between the two. All of this is external to the function's code, separating the logic of the function from being commingled with validation and documentation about the code.
Regarding WORKFLOW:
One archetypal example of the benefits of relationship-checking, versus only type-checking, is a function which computes the substring of a string. Type checking ensures that in (subs s start end) the s is a string and start and end are integers. However, additional checking must be done within the function to ensure that start and end are positive integers, that end is greater than start, and that the resulting substring is no larger than the original string. All of these things can be spec'd out, for example (forgive me if some of this is a bit redundant or maybe even inaccurate):
(s/fdef clojure.core/subs
:args (s/and (s/cat :s string? :start nat-int? :end (s/? nat-int?))
(fn [{:keys [s start end]}]
(if end
(<= 0 start end (count s))
(<= 0 start (count s)))))
:ret string?
:fn (fn [{{:keys [s start end]} :args, substring :ret}]
(and (if end
(= (- end start) (count substring))
(= (- (count s) start) (count substring)))
(<= (count substring) (count s)))))
Call the function with sample data meeting the above args spec:
(s/exercise-fn `subs)
Or run 1000 tests (this may fail a few times, but keep running and it will work--this is due to the built-in generator not being able to satisfy the second part of the :args predicate; a custom generator can be written if needed):
(stest/check `subs)
Or, want to see if your app makes calls to subs that are invalid while it's running in real time? Just run this, and you'll get a spec exception if the function is called and the specs are not met:
(stest/instrument `subs)
We have not integrated this into our work flow yet, and can't in production since it's still alpha, but the first goal is to write specs. I'm putting them in the same namespace but in separate files currently.
I foresee our work flow being to run the tests for spec'd functions using this (found in the clojure spec guide):
(-> (stest/enumerate-namespace 'user) stest/check)
Then, it would be advantageous to turn on instrumenting for all functions, and run the app under load as we normally would test it, and ensure that "real world" data works.
You can also use s/conform to destructure complex data in functions themselves, or use s/valid as pre- and post- conditions for running functions. I'm not too keen on this, as it's overhead in a production system, but it is a possibility.
The sky's the limit, and we've just scratched the surface! Cool things coming in the next months and years with spec!
I have a 700 mb XML file that I process from a records tree to an EDN file.
After having do all the processing, I finally have a lazy sequence of hashmaps that are not particularely big (at most 10 values).
To finish, I want to write it to a file with
(defn write-catalog [catalog-edn]
(with-open [wrtr (io/writer "catalog-fr.edn")]
(doseq [x catalog-edn]
(.write wrtr (prn-str x)))))
I do not understand the problem because doseq is supposed to do not retain the head of the sequence in memory.
My final output catalog is of type clojure.lang.LazySeq.
I then do
(write-catalog catalog)
Then memory usage is grinding and I have a GC overhead error at around 80mb of file writter with a XmX of 3g.
I tried also with a doseq + spit and no prn-str, same thing happen.
Is this a normal behaviour ?
Thanks
Possibly the memory leaks due to the catalog values realization (google "head retention"). When your write-catalog realizes items one by one, they are kept in memory (obviously you're def'fing catalog somewhere). To fix this you may try to avoid keeping your catalog in a variable, instead pass it to the write-catalog at once. Like if you parse it from somewhere (which i guess is true, considering your previous question), you would want to do:
(write-catalog (transform-catalog (get-catalog "mycatalog.xml")))
so huge intermediate sequences won't eat all your memory
Hope it helps.
I have this function that reproduces my problem:
(defn my-problem
[preprocess count print-freq]
(doseq [x (preprocess (range 0 count))]
(when (= 0 (mod x print-freq))
(println x))))
Everything works fine when I call it with identity function like this :
(my-problem identity 10000000 200000)
;it prints 200000,400000 ... 9800000 just as it should
When I call it with seque function I get OutOfMemoryError :
(my-problem #(seque 5 %) 10000000 200000)
;it prints numbers up to 2000000 and then it throws OutOfMemoryException
My understanding is that seque function should just split the processing into two threads using ConcurrentBlockingQueue with max size 5 (in this case). I don't understand where the memory leak is.
The way seque is implemented, if you consume elements much more quickly than you can produce them, a large number of agent tasks will pile up in the queue used internally by seque (up to one task per element in the sequence). In theory what you're doing should be fine, but in practice it doesn't really work out. You should be able to see the same effect just by running (dorun (seque (range))).
You can also use the function sequeue in flatland/useful, which makes tradeoffs that are different from the ones in clojure.core. Read the docstring carefully, but I think it would work well for your situation.
I know I can get the time take to evaluate a function can be printed out on the screen/stdout using the time function/macro.
The time macro returns the value of the evaluated function, which makes it great to use it inline. However I want to automatically measure the runtime under specific circumstances.
Is there a function which returns the elapsed time in some library to help with this benchmarking?
If it's just a matter of wanting to capture the string programmatically, you can bind *out* to something else before using time.
user=> (def x (with-out-str (time (+ 2 2))))
#'user/x
user=> x
"\"Elapsed time: 0.119 msecs\"\n"
If you want more control over the format, then you can create your own version of time using Java's System time methods, which is what the time macro uses under the hood anyway:
user => (macroexpand '(time (+ 2 2)))
(let* [start__4197__auto__ (. java.lang.System (clojure.core/nanoTime))
ret__4198__auto__ (+ 2 2)]
(clojure.core/prn (clojure.core/str "Elapsed time: " (clojure.core//
(clojure.core/double
(clojure.core/- (. java.lang.System (clojure.core/nanoTime))
start__4197__auto__)) 1000000.0) " msecs"))
ret__4198__auto__)
Take that basic structure and replace the call to prn with whatever reporting mechanism you would prefer.
You might want to look into Hugo Duncan's benchmarking library for Clojure -- Criterium.
From the README:
Criterium measures the computation time of an expression. It is designed to address some of the pitfalls of benchmarking, and benchmarking on the JVM in particular.
This includes:
statistical processing of multiple evaluations
inclusion of a warm-up period, designed to allow the JIT compiler to optimise its code
purging of gc before testing, to isolate timings from GC state prior to testing
a final forced GC after testing to estimate impact of cleanup on the timing results
If you generate a Java binding for your functions (and with some Clojure Maven plugin reconfiguration), you can even use JMH (http://openjdk.java.net/projects/code-tools/jmh/) which is probably the best JVM microbenchmarking harness. Have a look at https://github.com/circlespainter/pulsar-auto-instrument-bench
I'm having some trouble understanding how the delay macro works in Clojure. It doesn't seem to do what expect it to do (that is: delaying evaluation). As you can see in this code sample:
; returns the current time
(defn get-timestamp [] (System/currentTimeMillis))
; var should contain the current timestamp after calling "force"
(def current-time (delay (get-timestamp)))
However, calling current-time in the REPL appears to immediately evaluate the expression, even without having used the force macro:
user=> current-time
#<Delay#19b5217: 1276376485859>
user=> (force current-time)
1276376485859
Why was the evaluation of get-timestamp not delayed until the first force call?
The printed representation of various objects which appears at the REPL is the product of a multimethod called print-method. It resides in the file core_print.clj in Clojure's sources, which constitutes part of what goes in the clojure.core namespace.
The problem here is that for objects implementing clojure.lang.IDeref -- the Java interface for things deref / # can operate on -- print-method includes the value behind the object in the printed representation. To this end, it needs to deref the object, and although special provisions are made for printing failed Agents and pending Futures, Delays are always forced.
Actually I'm inclined to consider this a bug, or at best a situation in need of an improvement. As a workaround for now, take extra care not to print unforced delays.