Onyx: Can't pick up trigger/emit results in the next task - clojure

I'm trying to get started with Onyx, the distributed computing platform in Clojure. In particular, I try to understand how to aggregate data. If I understand the documentation correctly, a combination of a window and a :trigger/emit function should allow me to do this.
So, I modified the aggregation example (Onyx 0.13.0) in three ways (cf. gist with complete code):
in -main I println any segments put on the output channel; this works as expected with the original code in that it picks up all segments and prints them to stdout.
I add an emit function like this:
(defn make-ds
[event window trigger {:keys [lower-bound upper-bound event-type] :as state-event} extent-state]
(println "make-ds called")
{:ds window})
I add a trigger configuration (original dump-words trigger emitted for brevity):
(def triggers
[{:trigger/window-id :word-counter
:trigger/id :make-ds
:trigger/on :onyx.triggers/segment
:trigger/fire-all-extents? true
:trigger/threshold [5 :elements]
:trigger/emit ::make-ds}])
I change the :count-words task to from calling the identity function to the reduce type, so that it doesn't hand over all input segments to the output (and added config options that onyx should tackle this as a batch):
{:onyx/name :count-words
;:onyx/fn :clojure.core/identity
:onyx/type :reduce ; :function
:onyx/group-by-key :word
:onyx/flux-policy :kill
:onyx/min-peers 1
:onyx/max-peers 1
:onyx/batch-size 1000
:onyx/batch-fn? true}
When I run this now, I can see in the output that the emit function (i.e. make-ds) gets called for each input segment (first output coming from the dump-words trigger of the original code):
> lein run
[....]
Om -> 1
name -> 1
My -> 2
a -> 1
gone -> 1
Coffee -> 1
to -> 1
get -> 1
Time -> 1
make-ds called
make-ds called
make-ds called
make-ds called
[....]
However, the segment build from make-ds doesn't make it through to the output-channel, they are never being printed. If I revert the :count-words task to the identity function, this works just fine. Also, it looks as if the emit function is called for each input segment, whereas I would expect it to be called only when the threshold condition is true (i.e. whenever 5 elements have been aggregated in the window).
As the test for this functionality within the Onyx code base (onyx.windowing.emit-aggregate-test) is passing just fine, I guess I'm making a stupid mistake somewhere, but I'm at a loss figuring out what.

I finally saw that there was a warning in the log file onxy.log like this:
[clojure.lang.ExceptionInfo: Windows cannot be checkpointed with ZooKeeper unless
:onyx.peer/storage.zk.insanely-allow-windowing? is set to true in the peer config.
This should only be turned on as a development convenience.
[clojure.lang.ExceptionInfo: Handling uncaught exception thrown inside task
lifecycle :lifecycle/checkpoint-state. Killing the job. -> Exception type:
clojure.lang.ExceptionInfo. Exception message: Windows cannot be checkpointed with
ZooKeeper unless :onyx.peer/storage.zk.insanely-allow-windowing? is set to true in
the peer config. This should only be turned on as a development convenience.
As soon as I set this, I finally got some segments handed over to the next task. I.e., I had to change the peer config to:
(def peer-config
{:zookeeper/address "127.0.0.1:2189"
:onyx/tenancy-id id
:onyx.peer/job-scheduler :onyx.job-scheduler/balanced
:onyx.peer/storage.zk.insanely-allow-windowing? true
:onyx.messaging/impl :aeron
:onyx.messaging/peer-port 40200
:onyx.messaging/bind-addr "localhost"})
Now, :onyx.peer/storage.zk.insanely-allow-windowing? doesn't sound like a good thing to do. Lucas Bradstreet recommended on the Clojurians Slack channel switching to S3 checkpointing.

Related

Connection Pooling in Clojure

I am unable to understand the use of pool-db and connection function
in this connection pooling guide.
(defn- get-pool
"Creates Database connection pool to be used in queries"
[{:keys [host-port db-name username password]}]
(let [pool (doto (ComboPooledDataSource.)
(.setDriverClass "com.mysql.cj.jdbc.Driver")
(.setJdbcUrl (str "jdbc:mysql://"
host-port
"/" db-name))
(.setUser username)
(.setPassword password)
;; expire excess connections after 30 minutes of inactivity:
(.setMaxIdleTimeExcessConnections (* 30 60))
;; expire connections after 3 hours of inactivity:
(.setMaxIdleTime (* 3 60 60)))]
{:datasource pool}))
(def pool-db (delay (get-pool db-spec)))
(defn connection [] #pool-db)
; usage in code
(jdbc/query (connection) ["Select SUM(1, 2, 3)"])
Why can't we simply do?
(def connection (get-pool db-spec))
; usage in code
(jdbc/query connection ["SELECT SUM(1, 2, 3)"])
The delay ensures that you create the connection pool the first time you try to use it, rather than when the namespace is loaded.
This is a good idea because your connection pool may fail to be created for any one of a number of reasons, and if it fails during namespace load you will get some odd behaviour - any defs after your failing connection pool creation will not be evaluated, for example.
In general, top level var definitions should be constructed so they cannot fail at runtime.
Bear in mind they may also be evaluated during the AOT compile process, as amalloy notes below.
In your application, you want to create the pool just one time and reuse it. For this reason, delay is used to wrap the (get-pool db-spec) method so that this method will be invoked only the first time it is forced with deref/# and will cache the pool return it in subsequent forcecalls
The difference is that in the delay version a pool will be created only if it is called (which might not be the case if everything was cached), but the non-delay version will instantiate a pool no matter what, i.e. always, even if a database connection is not used.
delay runs only if deref is called and does nothing otherwise.
I would suggest you use an existing library to handle connection pooling, something like hikari-cp, which is highly configurable and works across many implements of SQL.

Persisting State from a DRPC Spout in Trident

I'm experimenting with Storm and Trident for this project, and I'm using Clojure and Marceline to do so. I'm trying to expand the wordcount example given on the Marceline page, such that the sentence spout comes from a DRPC call rather than from a local spout. I'm having problems which I think stem from the fact that the DRPC stream needs to have a result to return to the client, but I would like the DRPC call to effectively return null, and simply update the persisted data.
(defn build-topology
([]
(let [trident-topology (TridentTopology.)]
(let [
;; ### Two alternatives here ###
;collect-stream (t/new-stream trident-topology "words" (mk-fixed-batch-spout 3))
collect-stream (t/drpc-stream trident-topology "words")
]
(-> collect-stream
(t/group-by ["args"])
(t/persistent-aggregate (MemoryMapState$Factory.)
["args"]
count-words
["count"]))
(.build trident-topology)))))
There are two alternatives in the code - the one using a fixed batch spout loads with no problem, but when I try to load the code using a DRPC stream instead, I get this error:
InvalidTopologyException(msg:Component: [b-2] subscribes from non-existent component [$mastercoord-bg0])
I believe this error comes from the fact that the DRPC stream must be trying to subscribe to an output in order to have something to return to the client - but persistent-aggregate doesn't offer any such outputs to subscribe to.
So how can I set up my topology so that a DRPC stream leads to my persisted data being updated?
Minor update: Looks like this might not be possible :( https://issues.apache.org/jira/browse/STORM-38

Interleaving Watch Multi/exec on a single Redis connection. Expected or weird behavior?

Consider a front-facing app where every request shares the same Redis Connection, which I believe is the recommended way (?).
In this situation I believe I'm seeing some weird watch multi/exec behavior. Specifically, I would expect one of two transactions to fail because of optimistic locking failure (i.e.: the watch guard) but both seem to go through without throwing a tantrum, but result in the wrong final value.
To illustrate see the below contrived scenario. It's in Node, but I believe it's a general thing. This runs 2 processes in parallel which both update a counter. (It basically implements the canonical example of Watch as seen in the Redis Docs.
The expected result is that the first process results in an increment of 1 while the second fails to update and returns null. Instead, the result is that both processes update the counter with 1. However one is based on a stale counter so in the end the counter is incremented with 1 instead of 2.
//NOTE: db is a promisified version of node-redis, but that really doesn't matter
var db = Source.app.repos.redis._raw;
Promise.all(_.reduce([1, 2], function(arr, val) {
db.watch("incr");
var p = Promise.resolve()
.then(function() {
return db.get("incr");
})
.then(function(val) { //say 'val' returns '4' for both processes.
console.log(val);
val++;
db.multi();
db.set("incr", val);
return db.exec();
})
.then(function(resultShouldBeNullAtLeastOnce) {
console.log(resultShouldBeNullAtLeastOnce);
return; //explict end
});
arr.push(p);
return arr;
}, [])).then(function() {
console.log("done all");
next(undefined);
})
The resulting interleaving is seen when tailing Redis' MONITOR command:
1414491001.635833 [0 127.0.0.1:60979] "watch" "incr"
1414491001.635936 [0 127.0.0.1:60979] "watch" "incr"
1414491001.636225 [0 127.0.0.1:60979] "get" "incr"
1414491001.636242 [0 127.0.0.1:60979] "get" "incr"
1414491001.636533 [0 127.0.0.1:60979] "multi"
1414491001.636723 [0 127.0.0.1:60979] "set" "incr" "5"
1414491001.636737 [0 127.0.0.1:60979] "exec"
1414491001.639660 [0 127.0.0.1:60979] "multi"
1414491001.639691 [0 127.0.0.1:60979] "set" "incr" "5"
1414491001.639704 [0 127.0.0.1:60979] "exec"
Is this expected behavior? Would using multiple redis connections circumvent this issue?
To answer my own question:
This is expected behavior. The first exec unwatches all properties. Therefore, the second multi/exec goes through without watch-guard.
It's in the docs, but it's fairly hidden.
Solution: use multiple connections, in spite of some answers on SO explicitly warning against this, since it (quote) 'shouldn't be needed'. In this situation IT IS needed.
Too late but for anyone reading this in the future, the solution suggested by Geert is not advised by Redis.
One request per connection
Many databases use the concept of REST as a primary interface—send a plain old HTTP request to an endpoint with arguments encoded as POST. The database grabs the information and returns it as a response with a status code and closes the connection. Redis should be used differently—the connection should be persistent and you should make requests as needed to a long-lived connection. However, well-meaning developers sometimes create a connection, run a command, and close the connection. While opening and closing connections per command will technically work, it’s far from optimal and needlessly cuts into the performance of Redis as a whole.
Using the OSS Cluster API, the connection to the nodes are maintained by the client as needed, so you’ll have multiple connections open to different nodes at any given time. With Redis Enterprise, the connection is actually to a proxy, which takes care of the complexity of connections at the cluster level.
TL;DR: Redis connections are designed to stay open across countless operations.
Best-practice alternative: Keep your connections open over multiple commands.
A better solution to tackle this solution is to use lua scripts and make your set of operations blocking and atomic.
EVAL to run redis scripts

Clojure: architecture advice needed

I'm writing a little clojure pub/sub interface. It's very barebones, only two methods that will actually be used: do-pub and sub-listen. sub-listen takes a string (a sub name) and do-pub takes two strings (a sub name and a value).
I'm still fairly new at clojure and am having some trouble coming up with a workable way to do this. My first thought (and indeed my first implementation) uses a single agent which holds a hash:
{ subname (promise1 promise2 etc) }
When a thread wants to sub it conj's a promise object to the list associated with the sub it wants, then immediately tries to de-reference that promise (therefore blocking).
When a pub happens it goes through every item in the list for the sub and delivers the value to that item (the promise). It then dissoc's that subname from the map and returns it to the agent.
In this way I got a simple pub sub implementation working. However, the problem comes when someone subs, doesn't receive a pub for a certain amount of time, then gets killed due to timeout. In this scenario there will be a worthless promise in the agent that doesn't need to be, and moreover this will be a source of a memory leak if that sub never gets pub'd.
Does anyone have any thoughts on how to solve this? Or if there is a better way to do what I'm trying to do overall (I'm trying to avoid using any external pre-cooked pubsub libraries, this is a pet project not a work one)?
You can do something like this:
Create an atom
publish function will update the atom value by the passed in value to the function
Subscribers can use add-watch on the atom to be notified of when the atom value changes i.e due to call to publish function
Use remove-watch to remove the subscription.
This way you will have a very basic pub-sub system.
I have marked Ankur's answer as the solution but I wanted to expand on it a bit. What I ended up doing is having a central atom that all client threads do an add-watch on. When a pub is done the atom's value is changed to a vector containing the name of the sub and the value being pub'd.
The function the clients pass to add-watch is a partial function which looks like
(partial (fn [prom sub key ref _old new] ...) sub prom)
where prom is a promise previously generated. The client then blocks while waiting on that promise. The partial function checks if the sub in new is the same as sub, if so it removes the watch and delivers on the promise with the value from new.

How does HOpenGL behave with regards to other threads and TChans in Haskell?

I'm doing some proof-of-concept work for a fairly complex video game I'd like to write in Haskell using the HOpenGL library. I started by writing a module that implements client-server event based communication. My problem appears when I try to hook it up to a simple program to draw clicks on the screen.
The event library uses a list of TChans made into a priority queue for communication. It returns an "out" queue and an "in" queue corresponding to server-bound and client-bound messages. Sending and receiving events are done in separate threads using forkIO. Testing the event library without the OpenGL part shows it communicating successfully. Here's the code I used to test it:
-- Client connects to server at localhost with 3 priorities in the priority queue
do { (outQueue, inQueue) <- client Nothing 3
-- send 'Click' events until terminated, the server responds with the coords negated
; mapM_ (\x -> atomically $ writeThing outQueue (lookupPriority x) x)
(repeat (Click (fromIntegral 2) (fromIntegral 4)))
}
This produces the expected output, namely a whole lot of send and receive events. I don't think the problem lies with the Event handling library.
The OpenGL part of the code checks the incoming queue for new events in the displayCallback and then calls the event's associated handler. I can get one event (the Init event, which simply clears the screen) to be caught by the displayCallback, but after that nothing is caught. Here's the relevant code:
atomically $ PQ.writeThing inqueue (Events.lookupPriority Events.Init) Events.Init
GLUT.mainLoop
render pqueue =
do event <- atomically $
do e <- PQ.getThing pqueue
case e of
Nothing -> retry
Just event -> return event
putStrLn $ "Got event"
(Events.lookupHandler event Events.Client) event
GL.flush
GLUT.swapBuffers
So my theories as to why this is happening are:
The display callback is blocking all of the sending and receiving threads on the retry.
The queues are not being returned properly, so that the queues that the client reads are different than the ones that the OpenGL part reads.
Are there any other reasons why this could be happening?
The complete code for this is too long to post on here although not too long (5 files under 100 lines each), however it is all on GitHub here.
Edit 1:
The client is run from within the main function in the HOpenGL code like so:
main =
do args <- getArgs
let ip = args !! 0
let priorities = args !! 1
(progname, _) <- GLUT.getArgsAndInitialize
-- Run the client here and bind the queues to use for communication
(outqueue, inqueue) <- Client.client (Just ip) priorities
GLUT.createWindow "Hello World"
GLUT.initialDisplayMode $= [GLUT.DoubleBuffered, GLUT.RGBAMode]
GLUT.keyboardMouseCallback $= Just (keyboardMouse outqueue)
GLUT.displayCallback $= render inqueue
PQ.writeThing inqueue (Events.lookupPriority Events.Init) Events.Init
GLUT.mainLoop
The only flag I pass to GHC when I compile the code is -package GLUT.
Edit 2:
I cleaned up the code on Github a bit. I removed acceptInput since it wasn't doing anything really and the Client code isn't supposed to be listening for events of its own anyway, that's why it's returning the queues.
Edit 3:
I'm clarifying my question a little bit. I took what I learned from #Shang and #Laar and kind of ran with it. I changed the threads in Client.hs to use forkOS instead of forkIO (and used -threaded at ghc), and it looks like the events are being communicated successfully, however they are not being received in the display callback. I also tried calling postRedisplay at the end of the display callback but I don't think it ever gets called (because I think the retry is blocking the entire OpenGL thread).
Would the retry in the display callback block the entire OpenGL thread? If it does, would it be safe to fork the display callback into a new thread? I don't imagine it would, since the possibility exists that multiple things could be trying to draw to the screen at the same time, but I might be able to handle that with a lock. Another solution would be to convert the lookupHandler function to return a function wrapped in a Maybe, and just do nothing if there aren't any events. I feel like that would be less than ideal as I'd then essentially have a busy loop which was something I was trying to avoid.
Edit 4:
Forgot to mention I used -threaded at ghc when I did the forkOS.
Edit 5:
I went and did a test of my theory that the retry in the render function (display callback) was blocking all of OpenGL. I rewrote the render function so it didn't block anymore, and it worked like I wanted it to work. One click in the screen gives two points, one from the server and from the original click. Here's the code for the new render function (note: it's not in Github):
render pqueue =
do event <- atomically $ PQ.getThing pqueue
case (Events.lookupHandler event Events.Client) of
Nothing -> return ()
Just handler ->
do let e = case event of {Just e' -> e'}
handler e
return ()
GL.flush
GLUT.swapBuffers
GLUT.postRedisplay Nothing
I tried it with and without the postRedisplay, and it only works with it. The problem now becomes that this pegs the CPU at 100% because it's a busy loop. In Edit 4 I proposed threading off the display callback. I'm still thinking of a way to do that.
A note since I haven't mentioned it yet. Anybody looking to build/run the code should do it like this:
$ ghc -threaded -package GLUT helloworldOGL.hs -o helloworldOGL
$ ghc server.hs -o server
-- one or the other, I usually do 0.0.0.0
$ ./server "localhost" 3
$ ./server "0.0.0.0" 3
$ ./helloworldOGL "localhost" 3
Edit 6: Solution
A solution! Going along with the threads, I decided to make a thread in the OpenGL code that checked for events, blocking if there aren't any, and then calling the handler followed by postRedisplay. Here it is:
checkEvents pqueue = forever $
do event <- atomically $
do e <- PQ.getThing pqueue
case e of
Nothing -> retry
Just event -> return event
putStrLn $ "Got event"
(Events.lookupHandler event Events.Client) event
GLUT.postRedisplay Nothing
The display callback is simply:
render = GLUT.swapBuffers
And it works, it doesn't peg the CPU for 100% and events are handled promptly. I'm posting this here because I couldn't have done it without the other answers and I feel bad taking the rep when the answers were both very helpful, so I'm accepting #Laar's answer since he has the lower Rep.
One possible cause could be the use of threading.
OpenGL uses thread local storage for it's context. Therefore all calls using OpenGL should be made from the same OS thread. HOpenGL (and OpenGLRaw too) is a relatively simple binding around the OpenGL library and is not providing any protection or workarounds to this 'problem'.
On the other hand are you using forkIO to create a light weight haskell thread. This thread is not guaranteed to stay on the same OS thread. Therefore the RTS might switch it to another OS thread where the thread local OpenGL-context is not available. To resolve this problem there is the forkOS function, which creates a bound haskell thread. This bound haskell thread will always run on the same OS thread and thus having its thread local state available. The documentation about this can be found in the 'Bound Threads' section of Control.Concurrent, forkOS can also be found there.
edits:
With the current testing code this problem is not present, as you're not using -threaded. (removed incorrect reasoning)
Your render function ends up being called only once, because the display callback is only called where there is something new to draw. To request a redraw, you need to call
GLUT.postRedisplay Nothing
It takes an optional window parameter, or signals a redraw for the "current" window when you pass Nothing. You usually call postRedisplay from an idleCallback or a timerCallback but you can also call it at the end of render to request an immediate redraw.