Extending a library-provided protocol without impacting other users - clojure

I'm using a 3rd-party library (clj-msgpack), and wish to extend a protocol for a type which the library also provides a handler for.
On its own, this is simple enough -- but is there any way to do this which wouldn't impact other users of this library running inside the same JVM? Something similar to a dynamic var binding (only taking effect under a given point on the stack) would be ideal.
At present, I'm doing an unconditional override but using a dynamic var to enable my modified behavior; however, this feels far too much like monkey-patching for my comfort.
For the curious, the (admitted abomination) I'm putting into place follows:
(in-ns 'clj-msgpack.core)
(def ^:dynamic *keywordize-strings*
"Assume that any string starting with a colon should be unpacked to a keyword"
false)
(extend-protocol Unwrapable
RawValue
(unwrap [o]
(let [v (.getString o)]
(if (and *keywordize-strings* (.startsWith v ":"))
(keyword (.substring v 1))
v))))

After some thought I see two basic approches (one of which I get from you):
Dynamic binding (as you are doing it now):
Some complain that dynamic binding holds to the principal of most supprise; "what? is behaves this way only when called from there?". While I don't personally hold to this being a bad-thing(tm) some people do. In this case it exacly matches your desire and so long as you have one point where you decide if you want keywordized-strings this should work. If you add a second point that changes them back and a code path that crosses the two well... your on your own. But hey, working code has it's merits.
Inheritance:
good'ol java style or using clojure's add-hoc heirarchies you could extend the type of object you are passing around to be keywordized-string-widgewhatzit that extends widgewhatzit and add a new handler for your specific subclass. This only works in some cases and forces a different object style on the rest of the design. Some smart people will also argue that it still follows the principal of most surprise because the type of the objects will be different when called via another code path.
Personally I would go with your existing solution unless you can change your whole program to use keywords instead of strings (which would of course be my first (potentially controversial) choice)

Related

Advising protocol methods in Clojure

I'm trying to advise a number of methods in one library with utility functions from another library, where some of the methods to be advised are defined with (defn) and some are defined with (defprotocol).
Right now I'm using this library, which uses (alter-var-root). I don't care which library I use (or whether I hand-roll my own).
The problem I'm running into right now is that protocol methods sometimes can be advised, and sometimes cannot, depending on factors that are not perfectly clear to me.
If I define a protocol, then define a type and implement that protocol in-line, then advising never seems to work. I am assuming this is because the type extends the JVM interface directly and skips the vars.
If, in a single namespace, I define a protocol, then advise its methods, and then extend the protocol to a type, the advising will not work.
If, in a single namespace, I define a protocol, then extend the protocol to a type, then advise the protocol's methods, the advising will work.
What I would like to do is find a method of advising that works reliably and does not rely on undefined implementation details. Is this possible?
Clojure itself doesn't provide any possibilities to advice functions in a reliable way, even those defined via def/defn. Consider the following example:
(require '[richelieu.core :as advice])
(advice/defadvice add-one [f x] (inc (f x)))
(defn func-1 [x] x)
(def func-2 func-1)
(advice/advise-var #'func-1 add-one)
> (func-1 0)
1
> (func-2 0)
0
After evaluation of the form (def func-2 func-1), var func-2 will contain binding of var func-1 (in other words its value), so advice-var won't affect it.
Eventhough, definitions like func-2 are rare, you may have noticed or used the following:
(defn generic-function [generic-parameter x y z]
...)
(def specific-function-1 (partial generic-function <specific-arg-1>))
(def specific-function-2 (partial generic-function <specific-arg-2>))
...
If you advice generic-function, none of specific functions will work as expected due to peculiarity described above.
If advising is critical for you, as a solution that may work, I'd suppose the following: since Clojure functions are compiled to java classes, you may try to replace java method invoke with other method that had desired behaviour (however, things become more complicated when talking about replacing protocol/interface methods: seems that you'll have to replace needed method in every class that implements particular protocol/interface).
Otherwise, you'll need explicit wrapper for every function that you want to advice. Macros may help to reduce boilerplate in this case.

Clojure best practice for nested let

Is is good practice to use Clojure nested let in the following way, or is it confusing ?
(defn a-fun [config]
(let [config (-> config (parse) (supply-defaults))]
;; do something with config
))
I noticed I have this pattern of parsing/checking/validating things quite often in my input functions that talk to the external world (in this case a Clojurescript library that exposes public functions, but I also had Compojure routes with this same feeling).
Is it confusing, because one has to understand the rules for bindings visibility (not sure what the exact wording is) ?
What would be the idiomatic way to do it ? Change the config name to parsed-config, put it in another function, something else completely ?
I would reach for this idiom when
the rebinding is the same kind of thing and
you want to make clear that the local binding supersedes the
global one.
For example
(defn fact [n]
(loop [n n, answer 1]
(if (pos? n)
(recur (dec n) (* answer n))
answer)))
This also stops you using the global binding by accident, as I was prone to do.
#Thumbnail's answer is good, but I personally would almost never shadow an outer binding with an inner one in this way. Even if you understand binding rules, and want to shadow an outer variable for a good reason, it's confusing for someone reading the code--which could very well be you, later, after you've forgotten how the code works.
Suppose I have a complex function, and I see the variable foo used somewhere in the middle of it. I look up and see a binding for it--perhaps as a function parameter, which would be obvious and easy to notice. If I don't notice that somewhere below that, the name was rebound, then I will misunderstand what's in the variable.
So I usually make up new, related names that correspond to the role of the different variables in the code. Sometimes the name differences are somewhat arbitrary.
I think these are good reasons not to shadow variables, and I think #Thumbnail gives go reasons to go ahead and shadow them. There are tradeoffs, and you have to decide what's best for your situation.
Short functions are probably better contexts for shadowing. Personally, I'd add a very noticeable comment if I did this sort of thing, or if I was doing it over and over again, maybe a very noticeable comment near the top of the file.
EDIT: As nha's comment made me realize, it can be more reasonable to shadow variables when the new binding occurs immediately after the previous binding; that makes it hard to miss the fact that the name is being redefined.
Another option would be to slightly rename the argument, keeping the general name for the "final" version of the data:
(defn a-fun [config-in]
(let [config (-> config-in (parse) (supply-defaults))]
;; do something with config
))
I also sometimes use the suffixes -arg, -orig, etc to differentiate various stages of processing.

How to have clojure support related values in binding?

I am using binding as a means to make it easier to pass around state within a call. I currently have something like the following
(binding [*private-key-path* "/Users/dcapwell/.ssh/id_dsa"]
(binding [*session* (session "localhost")]
...
The reason that I need to do this is that the session function requires private-key-path to be defined. Since binding doesn't allow related values, is there any simpler way to do the above without the needed nesting?
EDIT:
Currently prototyping using clj-ssh.ssh. The plan is to make most of my current usage of binding to be a static config (most are static values already, so read once on boot). Was using binding as a way to make prototyping easier so I didn't have to keep passing things around while seeing how the API worked.
I was just curios how I can get the bindings to be dependent on each other. When I use let, the second binding has access to the first one, but it seems that when I do this that the second binding doesn't have access to the first. I would assume there would be another function that acts like binding but would allow the second binding to have access to the first. I can also see this not existing in the default since its more of state than anything else.
Edited: some experiment in the REPL
(def ^:dynamic *a* "a not bound")
(def ^:dynamic *b* "b not bound")
(defn show-a! []
*a*)
(binding [*a* 1 *b* (show-a!)] *b*) ;; => "a not bound"
(binding [*a* 1]
(binding [*b* (show-a!)]
*b*)) ;; => 1
I was surprised to know the given code is NOT the same as:
(binding [*private-key-path* "/Users/dcapwell/.ssh/id_dsa"
*session* (session "localhost")]
...)
so Clojure tends to have less parentheses than other Lisps in let forms cases but not binding.
Please think about your overall code (and/or API design?) again and again. Dynamic bindings are bad. You will suffer soon. Tip: make more and more code unit-testable.
I'm not sure what you are trying to do, so I recommend this:
A feature called "Graph" for structural computation. Never worry about dependencies between functions again!
https://github.com/Prismatic/plumbing

How to deal with a variable in a library that needs to be set outside of it?

I'm using Datomic in several projects and it's time to move all the common code to a small utilities library.
One challenge is to deal with a shared database uri, on which most operations depend, but must be set by the project using the library. I wonder if there is a well-established way to do this. Here are some alternatives I've thought about:
Dropping the uri symbol in the library and adding the uri as an argument to every function that accesses the database
Altering it via alter-var-root, or similar mechanism, in an init function
Keeping it in the library as a dynamic var *uri* and overriding the value in a hopefully small adapter layer like
(def my-url ...bla...)
(defn my-fun [args]
(with-datomic-uri my-uri
(apply library/my-fun args))
Keeping uri as an atom in the library
There was a presentation from Stuart Sierra last Clojure/West, called Clojure in the Large, dealing with design patterns for larger Clojure applications.
One of those was the problem you describe.
To summarize tips regarding the problem at hand:
1 Clear constructor
So you have a well defined initial state.
(defn make-connection [uri]
{:uri uri
...}
2 Make dependencies clear
(defn update-db [connection]
...
3 It's easier to test
(deftest t-update
(let [conn (make-connection)]
(is (= ... (update-db conn)))))
4 Safer to reload
(require ... :reload)
Keeping uri in a variable to be bound later is pretty common, but introduces hidden dependencies, also assumes body starts and ends on a single thread.
Watch the talk, many more tips on design.
My feeling is to keep most datomic code as free of implicit state as possible.
Have query functions take a database value. Have write functions (transact) take a database connection. That maximizes potential reuse and avoids implicit assumptions like only ever talking to one database connection or inadvertently implicitly hardcoding query functions to only work on the current database value - as opposed to past (as-of) or "future" (with) database values.
Coordinating a single common connection for the standard use case of the library then becomes the job of small additional namespace. Using an atom makes sense here to hold the uri or connection. A few convenience macros, perhaps called with-connection, and with-current-db could then wrap the main library functions if manually coding for and passing connection and database values is a nuisance.

What's the point of defining something as dynamic when you don't need to define something as dynamic to with-redefs it?

It seems to me that with-redefs can do everything that binding to a dynamic symbol can do, only it doesn't have the limitation of needing the ^:dynamic metadata. So when should I use one over the other?
Aside from requiring the ^:dynamic metadata, binding also creates bindings that are only visible in the current thread, whereas the bindings made by with-redefs are visible in all threads. So, with-redefs is a very blunt tool and has the potential to affect other code running in the same VM. I've never seen with-redefs used outside of test code, nor should it be (at least in my opinion).
I would summarize the difference between the two as thus:
binding with ^:dynamic allows you to introduce a little bit of dynamic behavior in a controlled fashion. It's a good way of defining extension points in an API that let callers far up the call chain change the behavior of your code without having to explicitly pass parameters all the way through the call stack (some of which might not even be their code).
with-redefs is a free-for-all. It's useful in testing, e.g. for mocking out entire sub-systems when the function under test has lots of dependencies.
Declaring a var as ^:dynamic, together with the convention of using earmuffs to name dynamic vars (e.g. *my-dynamic-var*), has the added bonus that it's a self-documenting way of advertising to callers that that part of your code can be modified dynamically.
In summary: prefer ^:dynamic and binding when writing APIs and production code. Use with-redefs in testing, and as a last resort to dynamically alter the behavior of vars beyond your control that weren't declared ^:dynamic (and then, use with caution).