NullPointerException in Storm when running topology - clojure

I am getting NullPointerExceptions in backtype.storm.utils.DisruptorQueue.consumeBatchToCursor method when running my topology, specifically in a bolt. Spouts are duly executed.
Storm's troubleshooting page says that it might be due to multiple threads issuing methods on the OutputCollector. However, i cannot see where does it relate to my case.
Here's the code for the spout:
(defspout stub-spout ["stub-spout"]
[conf context collector]
(spout
(nextTuple []
(let [channel-value (<!! storm-async-channel)]
(emit-spout! collector [channel-value])))
(ack [id]
))))
and for the bolt:
(defbolt stub-bolt ["stub-bolt"] [tuple collector]
(println "Invocation!")
(let [obj (get tuple "object")
do-some-calculations (resolve 'calclib/do-some-calculations)
new-obj (do-some-calculations obj)]
(emit-bolt! collector new-obj)))
After some investigation it turned out that the call to resolve returns null (i need to resolve during runtime as some calculation occurs in a macro located in calclib).
The code runs properly in local cluster though. Why is this happening?
Will be grateful for any suggestions.
Thanks!

I think i've found a solution. The bolt definition is changed to a prepared bolt:
(defbolt stub-bolt ["stub-bolt"]
{:prepare true}
[conf context collector]
(let [f (load "/calclib/core")
do-some-calculations (resolve 'calclib/do-some-calculations)]
(bolt
(execute [tuple]
(let [obj (get tuple "object")
new-obj (do-some-calculations obj)]
(emit-bolt! collector new-obj))))))
Key is the call to load. I wonder if there's a more elegant approach though.

Related

Defining an agent lazily

I am creating an agent for writing changes back to a database (as discussed in the agent-based write-behind log in Clojure Programming).
This is working fine, but I am struggling to create the agent late. I don't want to create it as a def, as I don't want it created when my tests are running (I see the pool starting up when the tests load the forms even though I use with-redefs to set a test value).
The code I started with is (using c3p0 pooling):
(def dba (agent (pool/make-datasource-spec (u/load-config "db.edn"))))
I tried making the agent nil, and investigated how I could set it in the main of my application, when it's really needed. But there doesn't seem to be an equivalent reset! function as there is with an atom. And the following code also failed saying the agent wasn't in error so didn't need restarting:
(when (not #dba)
(restart-agent dba (create-db-pool)))
So at the moment, I have an atom containing the agent, where I then do:
(def dba (atom nil))
;; ...
(defn init-db! []
(when (not #dba)
(log/info "Creating agent for pooled connection")
(reset! dba (agent (create-db-pool))))
But the very fact I'm having to do ##dba to reference the contents of the agent (i.e. the pool) makes me think this is insane.
Is there a more obvious way of creating the pool agent lazily?
delay is useful for cases like this. It causes the item to be created the first time it is read. so if your tests don't read it it will not be created.
user=> (def my-agent (delay (do (println "im making the agent now") (agent 0))))
#'user/my-agent
user=>
user=> #my-agent
im making the agent now
#object[clojure.lang.Agent 0x2cd73ca1 {:status :ready, :val 0}]

How does the binding function work with core.async?

Is it ok to use binding with core.async? I'm using ClojureScript so core.async is very different.
(def ^:dynamic token "no-token")
(defn call
[path body]
(http-post (str host path) (merge {:headers {"X-token" token}} body)))) ; returns a core.async channel
(defn user-settings
[req]
(call "/api/user/settings" req))
; elsewhere after I've logged in
(let [token (async/<! (api/login {:user "me" :pass "pass"}))]
(binding
[token token]
(user-settings {:all-settings true})))
In ClojureScript1, binding is basically with-redefs plus an extra check that the Vars involved are marked :dynamic. On the other hand, gos get scheduled for execution1 in chunks (that is, they may be "parked" and later resumed, and interleaving between go blocks is arbitrary). These models don't mesh very well at all.
In short, no, please use explicitly-passed arguments instead.
1 The details are different in Clojure, but the conclusion remains the same.
2 Using the fastest mechanism possible, setTimeout with a time of 0 if nothing better is available.

Update clojure code running in a thread

Now, I need some kind of code reloading in the clojure/java mixing system.
I know tools.namespace lib can be used to refresh clj code. But when I try to reload code which is used in a thread, it is not updated.
This is the example clojure code I need to update: (just switch between upper case and lower case.)
(ns com.test.test-util)
(defn get-word [sentence]
(let [a-word (rand-nth (clojure.string/split sentence #"\s"))
;;a-word (clojure.string/upper-case a-word)
]
(println a-word)
a-word))
I used a future to start a thread and call the code above.
In main method, in my first test, I just create a future and refresh the code after a while. But it is not updated. So, I also tested to create another thread after refresh. But the code is not updated.
(defn start-job []
(future (
(println "start worker job.")
(while #job-cond
(do (get-word (rand-nth sentences))
(Thread/sleep 2000)) )
(println "return!")
)))
(defn -main []
(let [work-job (start-job)]
(println "modify clj file...")
(Thread/sleep 10000)
(swap! job-cond not)
(future-cancel work-job)
(swap! job-cond not)
(refresh)
(Thread/sleep 3000)
(let [new-work-job (start-job)]
(Thread/sleep 10000)
(swap! job-cond not)
(future-cancel new-work-job))))
Thanks.
In general you don't want code-reloading to affect runtime. Rather you should try to seek for clean ways to boot and shutdown the running application code using e. g. Stuart Sierras library component or juxt/jig by Malcolm Sparks. Then reload namespaces only when your application is not up and running (or at least not the parts of running code that depend on namespaces being reloaded).
That being said, here is why your example does not work:
start-job can not automatically refer to a reloaded version of get-word in its lifetime because the version of get-work it uses is resolved when it is compiled. So it will still use the old version.
Explicitly resolve the var of get-word using
(resolve `get-word)
and invoke the returned var as a function instead of using get-word directly in the future.
What will happen then?
After you made the changes and refresh has been invoked, -main will still refer to the old-version of start-job (so if you change that, it will not show effect in runtime). However within that get-word will be resolved as described above and the new version of get-word will be used.
Short explanation: You can't expect a running compiled function to replace itself with reloaded code on the fly (no matter whether it runs on a separate thread or not). That would (unless you are using resolve in it) be a necessity though for it to invoke reloaded functions.

Setting Clojure "constants" at runtime

I have a Clojure program that I build as a JAR file using Maven. Embedded in the JAR Manifest is a build-version number, including the build timestamp.
I can easily read this at runtime from the JAR Manifest using the following code:
(defn set-version
"Set the version variable to the build number."
[]
(def version
(-> (str "jar:" (-> my.ns.name (.getProtectionDomain)
(.getCodeSource)
(.getLocation))
"!/META-INF/MANIFEST.MF")
(URL.)
(.openStream)
(Manifest.)
(.. getMainAttributes)
(.getValue "Build-number"))))
but I've been told that it is bad karma to use def inside defn.
What is the Clojure-idiomatic way to set a constant at runtime? I obviously do not have the build-version information to embed in my code as a def, but I would like it set once (and for all) from the main function when the program starts. It should then be available as a def to the rest of the running code.
UPDATE: BTW, Clojure has to be one of the coolest languages I have come across in quite a while. Kudos to Rich Hickey!
I still think the cleanest way is to use alter-var-root in the main method of your application.
(declare version)
(defn -main
[& args]
(alter-var-root #'version (constantly (-> ...)))
(do-stuff))
It declares the Var at compile time, sets its root value at runtime once, doesn't require deref and is not bound to the main thread. You didn't respond to this suggestion in your previous question. Did you try this approach?
You could use dynamic binding.
(declare *version*)
(defn start-my-program []
(binding [*version* (read-version-from-file)]
(main))
Now main and every function it calls will see the value of *version*.
While kotarak's solution works very well, here is an alternative approach: turn your code into a memoized function that returns the version. Like so:
(def get-version
(memoize
(fn []
(-> (str "jar:" (-> my.ns.name (.getProtectionDomain)
(.getCodeSource)
(.getLocation))
"!/META-INF/MANIFEST.MF")
(URL.)
(.openStream)
(Manifest.)
(.. getMainAttributes)
(.getValue "Build-number")))))
I hope i dont miss something this time.
If version is a constant, it's going to be defined one time and is not going to be changed you can simple remove the defn and keep the (def version ... ) alone. I suppose you dont want this for some reason.
If you want to change global variables in a fn i think the more idiomatic way is to use some of concurrency constructions to store the data and access and change it in a secure way
For example:
(def *version* (atom ""))
(defn set-version! [] (swap! *version* ...))

Transform this Clojure call into a lazy sequence

I'm working with a messaging toolkit (it happens to be Spread but I don't know that the details matter). Receiving messages from this toolkit requires some boilerplate:
Create a connection to the daemon.
Join a group.
Receive one or more messages.
Leave the group.
Disconnect from the daemon.
Following some idioms that I've seen used elsewhere, I was able to cook up some working functions using Spread's Java API and Clojure's interop forms:
(defn connect-to-daemon
"Open a connection"
[daemon-spec]
(let [connection (SpreadConnection.)
{:keys [host port user]} daemon-spec]
(doto connection
(.connect (InetAddress/getByName host) port user false false))))
(defn join-group
"Join a group on a connection"
[cxn group-name]
(doto (SpreadGroup.)
(.join cxn group-name)))
(defn with-daemon*
"Execute a function with a connection to the specified daemon"
[daemon-spec func]
(let [daemon (merge *spread-daemon* daemon-spec)
cxn (connect-to-daemon daemon-spec)]
(try
(binding [*spread-daemon* (assoc daemon :connection cxn)]
(func))
(finally
(.disconnect cxn)))))
(defn with-group*
"Execute a function while joined to a group"
[group-name func]
(let [cxn (:connection *spread-daemon*)
grp (join-group cxn group-name)]
(try
(binding [*spread-group* grp]
(func))
(finally
(.leave grp)))))
(defn receive-message
"Receive a single message. If none are available, this will block indefinitely."
[]
(let [cxn (:connection *spread-daemon*)]
(.receive cxn)))
(Basically the same idiom as with-open, just that the SpreadConnection class uses disconnect instead of close. Grr. Also, I left out some macros that aren't relevant to the structural question here.)
This works well enough. I can call receive-message from inside of a structure like:
(with-daemon {:host "localhost" :port 4803}
(with-group "aGroup"
(... looping ...
(let [msg (receive-message)]
...))))
It occurs to me that receive-message would be cleaner to use if it were an infinite lazy sequence that produces messages. So, if I wanted to join a group and get messages, the calling code should look something like:
(def message-seq (messages-from {:host "localhost" :port 4803} "aGroup"))
(take 5 message-seq)
I've seen plenty of examples of lazy sequences without cleanup, that's not too hard. The catch is steps #4 and 5 from above: leaving the group and disconnecting from the daemon. How can I bind the state of the connection and group into the sequence and run the necessary cleanup code when the sequence is no longer needed?
This article describes how to do exactly that using clojure-contrib fill-queue. Regarding cleanup - the neat thing about fill-queue is that you can supply a blocking function that cleans itself up if there is an error or some condition reached. You can also hold a reference to the resource to control it externally. The sequence will just terminate. So depending on your semantic requirement you'll have to choose the strategy that fits.
Try this:
(ns your-namespace
(:use clojure.contrib.seq-utils))
(defn messages-from [daemon-spec group-name]
(let [cnx (connect-to-deamon daemon-spec))
group (connect-to-group cnx group-name)]
(fill-queue (fn [fill]
(if done?
(do
(.leave group)
(.disconnect cnx)
(throw (RuntimeException. "Finished messages"))
(fill (.receive cnx))))))
Set done? to true when you want to terminate the list. Also, any exceptions thrown in (.receive cnx) will also terminate the list.