Connection Pooling in Clojure - clojure

I am unable to understand the use of pool-db and connection function
in this connection pooling guide.
(defn- get-pool
"Creates Database connection pool to be used in queries"
[{:keys [host-port db-name username password]}]
(let [pool (doto (ComboPooledDataSource.)
(.setDriverClass "com.mysql.cj.jdbc.Driver")
(.setJdbcUrl (str "jdbc:mysql://"
host-port
"/" db-name))
(.setUser username)
(.setPassword password)
;; expire excess connections after 30 minutes of inactivity:
(.setMaxIdleTimeExcessConnections (* 30 60))
;; expire connections after 3 hours of inactivity:
(.setMaxIdleTime (* 3 60 60)))]
{:datasource pool}))
(def pool-db (delay (get-pool db-spec)))
(defn connection [] #pool-db)
; usage in code
(jdbc/query (connection) ["Select SUM(1, 2, 3)"])
Why can't we simply do?
(def connection (get-pool db-spec))
; usage in code
(jdbc/query connection ["SELECT SUM(1, 2, 3)"])

The delay ensures that you create the connection pool the first time you try to use it, rather than when the namespace is loaded.
This is a good idea because your connection pool may fail to be created for any one of a number of reasons, and if it fails during namespace load you will get some odd behaviour - any defs after your failing connection pool creation will not be evaluated, for example.
In general, top level var definitions should be constructed so they cannot fail at runtime.
Bear in mind they may also be evaluated during the AOT compile process, as amalloy notes below.

In your application, you want to create the pool just one time and reuse it. For this reason, delay is used to wrap the (get-pool db-spec) method so that this method will be invoked only the first time it is forced with deref/# and will cache the pool return it in subsequent forcecalls

The difference is that in the delay version a pool will be created only if it is called (which might not be the case if everything was cached), but the non-delay version will instantiate a pool no matter what, i.e. always, even if a database connection is not used.
delay runs only if deref is called and does nothing otherwise.

I would suggest you use an existing library to handle connection pooling, something like hikari-cp, which is highly configurable and works across many implements of SQL.

Related

Onyx: Can't pick up trigger/emit results in the next task

I'm trying to get started with Onyx, the distributed computing platform in Clojure. In particular, I try to understand how to aggregate data. If I understand the documentation correctly, a combination of a window and a :trigger/emit function should allow me to do this.
So, I modified the aggregation example (Onyx 0.13.0) in three ways (cf. gist with complete code):
in -main I println any segments put on the output channel; this works as expected with the original code in that it picks up all segments and prints them to stdout.
I add an emit function like this:
(defn make-ds
[event window trigger {:keys [lower-bound upper-bound event-type] :as state-event} extent-state]
(println "make-ds called")
{:ds window})
I add a trigger configuration (original dump-words trigger emitted for brevity):
(def triggers
[{:trigger/window-id :word-counter
:trigger/id :make-ds
:trigger/on :onyx.triggers/segment
:trigger/fire-all-extents? true
:trigger/threshold [5 :elements]
:trigger/emit ::make-ds}])
I change the :count-words task to from calling the identity function to the reduce type, so that it doesn't hand over all input segments to the output (and added config options that onyx should tackle this as a batch):
{:onyx/name :count-words
;:onyx/fn :clojure.core/identity
:onyx/type :reduce ; :function
:onyx/group-by-key :word
:onyx/flux-policy :kill
:onyx/min-peers 1
:onyx/max-peers 1
:onyx/batch-size 1000
:onyx/batch-fn? true}
When I run this now, I can see in the output that the emit function (i.e. make-ds) gets called for each input segment (first output coming from the dump-words trigger of the original code):
> lein run
[....]
Om -> 1
name -> 1
My -> 2
a -> 1
gone -> 1
Coffee -> 1
to -> 1
get -> 1
Time -> 1
make-ds called
make-ds called
make-ds called
make-ds called
[....]
However, the segment build from make-ds doesn't make it through to the output-channel, they are never being printed. If I revert the :count-words task to the identity function, this works just fine. Also, it looks as if the emit function is called for each input segment, whereas I would expect it to be called only when the threshold condition is true (i.e. whenever 5 elements have been aggregated in the window).
As the test for this functionality within the Onyx code base (onyx.windowing.emit-aggregate-test) is passing just fine, I guess I'm making a stupid mistake somewhere, but I'm at a loss figuring out what.
I finally saw that there was a warning in the log file onxy.log like this:
[clojure.lang.ExceptionInfo: Windows cannot be checkpointed with ZooKeeper unless
:onyx.peer/storage.zk.insanely-allow-windowing? is set to true in the peer config.
This should only be turned on as a development convenience.
[clojure.lang.ExceptionInfo: Handling uncaught exception thrown inside task
lifecycle :lifecycle/checkpoint-state. Killing the job. -> Exception type:
clojure.lang.ExceptionInfo. Exception message: Windows cannot be checkpointed with
ZooKeeper unless :onyx.peer/storage.zk.insanely-allow-windowing? is set to true in
the peer config. This should only be turned on as a development convenience.
As soon as I set this, I finally got some segments handed over to the next task. I.e., I had to change the peer config to:
(def peer-config
{:zookeeper/address "127.0.0.1:2189"
:onyx/tenancy-id id
:onyx.peer/job-scheduler :onyx.job-scheduler/balanced
:onyx.peer/storage.zk.insanely-allow-windowing? true
:onyx.messaging/impl :aeron
:onyx.messaging/peer-port 40200
:onyx.messaging/bind-addr "localhost"})
Now, :onyx.peer/storage.zk.insanely-allow-windowing? doesn't sound like a good thing to do. Lucas Bradstreet recommended on the Clojurians Slack channel switching to S3 checkpointing.

Persisting State from a DRPC Spout in Trident

I'm experimenting with Storm and Trident for this project, and I'm using Clojure and Marceline to do so. I'm trying to expand the wordcount example given on the Marceline page, such that the sentence spout comes from a DRPC call rather than from a local spout. I'm having problems which I think stem from the fact that the DRPC stream needs to have a result to return to the client, but I would like the DRPC call to effectively return null, and simply update the persisted data.
(defn build-topology
([]
(let [trident-topology (TridentTopology.)]
(let [
;; ### Two alternatives here ###
;collect-stream (t/new-stream trident-topology "words" (mk-fixed-batch-spout 3))
collect-stream (t/drpc-stream trident-topology "words")
]
(-> collect-stream
(t/group-by ["args"])
(t/persistent-aggregate (MemoryMapState$Factory.)
["args"]
count-words
["count"]))
(.build trident-topology)))))
There are two alternatives in the code - the one using a fixed batch spout loads with no problem, but when I try to load the code using a DRPC stream instead, I get this error:
InvalidTopologyException(msg:Component: [b-2] subscribes from non-existent component [$mastercoord-bg0])
I believe this error comes from the fact that the DRPC stream must be trying to subscribe to an output in order to have something to return to the client - but persistent-aggregate doesn't offer any such outputs to subscribe to.
So how can I set up my topology so that a DRPC stream leads to my persisted data being updated?
Minor update: Looks like this might not be possible :( https://issues.apache.org/jira/browse/STORM-38

Interleaving Watch Multi/exec on a single Redis connection. Expected or weird behavior?

Consider a front-facing app where every request shares the same Redis Connection, which I believe is the recommended way (?).
In this situation I believe I'm seeing some weird watch multi/exec behavior. Specifically, I would expect one of two transactions to fail because of optimistic locking failure (i.e.: the watch guard) but both seem to go through without throwing a tantrum, but result in the wrong final value.
To illustrate see the below contrived scenario. It's in Node, but I believe it's a general thing. This runs 2 processes in parallel which both update a counter. (It basically implements the canonical example of Watch as seen in the Redis Docs.
The expected result is that the first process results in an increment of 1 while the second fails to update and returns null. Instead, the result is that both processes update the counter with 1. However one is based on a stale counter so in the end the counter is incremented with 1 instead of 2.
//NOTE: db is a promisified version of node-redis, but that really doesn't matter
var db = Source.app.repos.redis._raw;
Promise.all(_.reduce([1, 2], function(arr, val) {
db.watch("incr");
var p = Promise.resolve()
.then(function() {
return db.get("incr");
})
.then(function(val) { //say 'val' returns '4' for both processes.
console.log(val);
val++;
db.multi();
db.set("incr", val);
return db.exec();
})
.then(function(resultShouldBeNullAtLeastOnce) {
console.log(resultShouldBeNullAtLeastOnce);
return; //explict end
});
arr.push(p);
return arr;
}, [])).then(function() {
console.log("done all");
next(undefined);
})
The resulting interleaving is seen when tailing Redis' MONITOR command:
1414491001.635833 [0 127.0.0.1:60979] "watch" "incr"
1414491001.635936 [0 127.0.0.1:60979] "watch" "incr"
1414491001.636225 [0 127.0.0.1:60979] "get" "incr"
1414491001.636242 [0 127.0.0.1:60979] "get" "incr"
1414491001.636533 [0 127.0.0.1:60979] "multi"
1414491001.636723 [0 127.0.0.1:60979] "set" "incr" "5"
1414491001.636737 [0 127.0.0.1:60979] "exec"
1414491001.639660 [0 127.0.0.1:60979] "multi"
1414491001.639691 [0 127.0.0.1:60979] "set" "incr" "5"
1414491001.639704 [0 127.0.0.1:60979] "exec"
Is this expected behavior? Would using multiple redis connections circumvent this issue?
To answer my own question:
This is expected behavior. The first exec unwatches all properties. Therefore, the second multi/exec goes through without watch-guard.
It's in the docs, but it's fairly hidden.
Solution: use multiple connections, in spite of some answers on SO explicitly warning against this, since it (quote) 'shouldn't be needed'. In this situation IT IS needed.
Too late but for anyone reading this in the future, the solution suggested by Geert is not advised by Redis.
One request per connection
Many databases use the concept of REST as a primary interface—send a plain old HTTP request to an endpoint with arguments encoded as POST. The database grabs the information and returns it as a response with a status code and closes the connection. Redis should be used differently—the connection should be persistent and you should make requests as needed to a long-lived connection. However, well-meaning developers sometimes create a connection, run a command, and close the connection. While opening and closing connections per command will technically work, it’s far from optimal and needlessly cuts into the performance of Redis as a whole.
Using the OSS Cluster API, the connection to the nodes are maintained by the client as needed, so you’ll have multiple connections open to different nodes at any given time. With Redis Enterprise, the connection is actually to a proxy, which takes care of the complexity of connections at the cluster level.
TL;DR: Redis connections are designed to stay open across countless operations.
Best-practice alternative: Keep your connections open over multiple commands.
A better solution to tackle this solution is to use lua scripts and make your set of operations blocking and atomic.
EVAL to run redis scripts

How to maintain two connections to different ElasticSearch hosts using Elastisch?

I’m using Elastisch, and the rest/connect function return an endpoint, but I can’t see how to reuse this endpoint when calling other functions. I need to transfer some documents from one index to another on different hosts, using a scroll on the first one and bulk indexing on the second one.
elastisch also offers theconnect (without the !) that returns the connection to you instead of storing it in a local var. You can call this twice and then use binding to bind the appropriate one for each call.
(let [client1 (connect ...)
client2 (connect ...)
data (binding [clojurewerkz.elastisch.native/*client* client1] ...)
(binding [clojurewerkz.elastisch.native/*client* client2] ... put stuff))

Why is my Clojure implementation of an LDAP paged results function not working?

TLDR; this is not a well-phrased question, so you should probably not bother with it. I'll delete it in the near future unless people think it has some redeeming feature other than being a good example of how not to ask a question on Stack Overflow.
I am using the UnboundID LDAP SDK for one of my projects. I'm currently stuck on implementing a paged results search (described in RFC2696), for which I have a working Java implementation. I have tested the Java code, and know it works correctly against my test LDAP directory. The main part of the Java implementation is the following do..while loop:
do
{
/*
* Set the simple paged results control (if the cookie is null
* this indicates the first time through the loop).
*/
final SimplePagedResultsControl simplePagedResultsRequestControl =
new SimplePagedResultsControl(pageSize,cookie);
searchRequest.setControls(simplePagedResultsRequestControl);
/*
* Issue the search request:
*/
SearchResult searchResult;
searchResult = ldapConnection.search(searchRequest);
final String msg =
String
.format(
"searchRequest transmitted, pageSize: %d, entries returned: %d",
Integer.valueOf(pageSize),
Integer.valueOf(searchResult.getEntryCount()));
final LogRecord record = new LogRecord(Level.INFO,msg);
ldapCommandLineTool.out(new MinimalLogFormatter().format(record));
total += searchResult.getEntryCount();
/*
* Get the cookie from the paged results control.
*/
cookie = null;
final SimplePagedResultsControl c =
SimplePagedResultsControl.get(searchResult);
if(c != null)
{
cookie = c.getCookie();
}
}
while(cookie != null && cookie.getValueLength() > 0);
A request "control" is added to the search request indicating to the server that it should send back a subset of matching entries. Assuming the initial request is valid, the LDAP server returns pageSize entries and a response control containing a special "cookie". To get the next "page" of results the client re-sends the request, with the cookie included in the request control, and the server includes a new cookie with the subsequent response. This cycle continues until there are no more entries to return, in which case no cookie is returned to the client and the search request is complete.
I have attempted to port the above code to Clojure, but so far I have been unable to get it to work. Here's the code:
(defn fetch-all
[& attrs]
(with-connection
(let [attrs (into-array (if attrs (map name attrs) ["*"]))
scope SearchScope/SUB
request (SearchRequest. searchbase scope account-filter attrs)]
(loop [results [] cookie nil]
(let [control [(SimplePagedResultsControl. page-size cookie)]]
(doto request
(.setSizeLimit 12345)
(.setTimeLimitSeconds 60)
(.setControls control))
(let [result (.search *conn* request)
results (concat result results)
cookie (.. SimplePagedResultsControl (get result) getCookie)]
(println "entries returned:" (.getEntryCount result))
(when-not (> 0 (.getValueLength cookie))
results
(recur
results cookie))))))))
The Java code retrieves 1720 entries with 18 requests, but mine fails with a "size limit exeeded" LDAPSearchException after five requests.
My question to you folks is why do the two implementations behave differently? I know that I'm sending the received cookie with each new request as an exception is thrown if the same cookie is used twice. I also think I know that I am getting subsequent pages of results, because the set of entries returned with each page is different.
I'm stumped, and not relishing the thought of dragging out Ettercap to analyse the traffic. Surely there's something very obviously wrong in my code which is causing the different behaviour.
(let [control [(SimplePagedResultsControl. page-size cookie)]]
Bind control to a vector of a single results control object.
This vector is then passed to
(.setControls control)
which seems to take a single results control object and not a vector as par your java code.
Okay, two bugs, and a gotcha in the RFC:
(bug) the concatenation of the result to results should have been (concat (.getSearchEntries result) results),
(bug) the 'when-not' test at the bottom of the loop should have been an 'if' or 'if-not'.
(gotcha) under section 6 of the RFC, "Security Considerations", it is suggested that "Servers implementations may enforce an overriding sizelimit", which is to say that as a non-privileged user I will still hit the limit, paged results or not. The difference in behaviour between implementations was simply due to me running the Java code with administrative privileges and the Clojure code anonymously (doh!).