CLojure: Higher order functions vs protocols vs multimethods - clojure

there are plenty protocols vs multimethods comparisions, but why not to use higher order functions?
Let's come with example:
We have some data (record for example). And we have methods serialize and deserialize.
Say that we want to save it into file, into json, and into database.
Should we create protocol called SerializationMethod and records called database, json, file that implement them? It looks kind of hack to create records only to use protocol. Second solution - multimethod - could take string parameter with serialization output and decide how to do this. But I am not sure that is right way to go...
And third way is to write function serialize and then pass there data and serializing function. But now I can not name serializing and deserializing method with same name (json fo example):
(defn serialize [method data]
(method data))
(defn json[data]
(...))
The question is how can I (or how should I) do this. Is there more generic way with higher order function? Or maybe I don't understand something well?
That are my first steps with clojure so please be tolerant.

Converting to JSON is different from writing to a database or a file because the latter are IO operations, the first is a pure transformation of data. With that in mind, I wouldn't recommend to implement them under the same interface.
Now assuming you had various serialization implementations, lets say json and fressian, it would certainly not be a good idea to implement them on every data structure that you want to (de-/)serialize. Your observation that that would be a hack is correct. More concisely, it would be limiting a record to be only (de-/)serializable with one implementation.
Instead, it would be more effective to have different Serializers, each implementing the same interface:
(defrecord JSONSerializer []
SerializationMethod
(serialize [this data] ...)
(deserialize [this data] ...))
(defrecord FressianSerializer []
SerializationMethod
...)
This way we end up having several serializer objects that can be passed to functions that require one. Those functions don't need to be concerned with the implementation.
Could higher order functions be passed instead?
(defn do-something
[params serialize deserialize]
...)
It would work, too. Notice however that this style can quickly grow out of hand. E. g. consider a scenario where a function should be written that deserializes data from one format and serializes it to the other.

Related

Is gathering namespace functions into a map via a macros idiomatic Clojure?

I'm learning Clojure via a pet project. The project would consist of several workers that would be called from other functions.
Each worker is defined in their own namespace as a set of functions (currently two: get-data for gathering data and write-data for writing the gathered data into a file).
In order to make the code a bit DRYer, I decided to write a macro that would gather functions from namespace into a map that can be passed around:
(ns clojure-bgproc.workers)
(defmacro gen-worker-info []
(let [get-data (ns-resolve *ns* 'get-data)
write-data (ns-resolve *ns* 'write-data)]
`(def ~(quote worker-info)
{:get-data ~get-data
:write-data ~write-data}
)
)
)
In my worker code, I use my macro (code abridged for clarity):
(ns clojure-bgproc.workers.summary
(:require [clojure-bgproc.workers :refer [gen-worker-info]]))
(defn get-data [params]
<...>
)
(defn write-data [data file]
;; <...>
)
(gen-worker-info)
While it does work (I get my get-data and write-data functions in clojure-bgproc.workers.summary/worker-info, I find it a bit icky, especially since, if I move the macro call to the top of the file, it doesn't work.
My question is, is there a more idiomatic way to do so? Is this idiomatic Clojure at all?
Thank you.
I think you're in a weird spot because you've structured your program wrong:
Each worker is defined in their own namespace as a set of functions
This is the real problem. Namespaces are a good place to put functions and values that you will refer to in hand-written code. For stuff you want to access programmatically, they are not a good storage space. Instead, make the data you want to access first-class by putting it into an ordinary proper data structure, and then it's easy to manipulate.
For example, this worker-info map you're thinking of deriving from the namespace is great! In fact, that should be the only way workers are represented: as a map with keys for the worker's functions. Then you just define somewhere a list (or vector, or map) of such worker maps, and that's your list of workers. No messing about with namespaces needed.
My go-to solution for defining the workers would be Protocols. I would also apply some of the well-tried frameworks for system lifecycle management.
Protocols provide a way of defining a set of methods and their signatures. You may think of them as similar, but more flexible than, interfaces in object-oriented programming.
Your workers will probably have some state and life-cycle, e.g., the workers may be running or stopped, acquiring and releasing a resource, and so on. I suggest you take a look at Integrant for managing a system with stateful components (i.e., workers).
I would argue for avoiding macros in this case. The adage data over functions over macros seems to apply here. Macros are not available at runtime, make debugging harder, and force all other programmers who look at your code to learn a new Domain-Specific Language, i.e., the one you defined with your macros.

How to deal with a variable in a library that needs to be set outside of it?

I'm using Datomic in several projects and it's time to move all the common code to a small utilities library.
One challenge is to deal with a shared database uri, on which most operations depend, but must be set by the project using the library. I wonder if there is a well-established way to do this. Here are some alternatives I've thought about:
Dropping the uri symbol in the library and adding the uri as an argument to every function that accesses the database
Altering it via alter-var-root, or similar mechanism, in an init function
Keeping it in the library as a dynamic var *uri* and overriding the value in a hopefully small adapter layer like
(def my-url ...bla...)
(defn my-fun [args]
(with-datomic-uri my-uri
(apply library/my-fun args))
Keeping uri as an atom in the library
There was a presentation from Stuart Sierra last Clojure/West, called Clojure in the Large, dealing with design patterns for larger Clojure applications.
One of those was the problem you describe.
To summarize tips regarding the problem at hand:
1 Clear constructor
So you have a well defined initial state.
(defn make-connection [uri]
{:uri uri
...}
2 Make dependencies clear
(defn update-db [connection]
...
3 It's easier to test
(deftest t-update
(let [conn (make-connection)]
(is (= ... (update-db conn)))))
4 Safer to reload
(require ... :reload)
Keeping uri in a variable to be bound later is pretty common, but introduces hidden dependencies, also assumes body starts and ends on a single thread.
Watch the talk, many more tips on design.
My feeling is to keep most datomic code as free of implicit state as possible.
Have query functions take a database value. Have write functions (transact) take a database connection. That maximizes potential reuse and avoids implicit assumptions like only ever talking to one database connection or inadvertently implicitly hardcoding query functions to only work on the current database value - as opposed to past (as-of) or "future" (with) database values.
Coordinating a single common connection for the standard use case of the library then becomes the job of small additional namespace. Using an atom makes sense here to hold the uri or connection. A few convenience macros, perhaps called with-connection, and with-current-db could then wrap the main library functions if manually coding for and passing connection and database values is a nuisance.

Clojure/LISP REST client design

Coming from an OOP background, I have a doubt on the recommended way of API design in Clojure. For example in an OOP language(Python here), for using some API I would do this:
api = someWebService()
api.setWriteApiKey(WRITE_API_KEY)
api.sampleloadSong('file.mp3')
In the above example, I set the API key once and call the associated methods again and again without ever passing the API key again. What is the recommended way of doing this in Clojure or any other LISP family of languages ?
Do I need to pass the key in each and every function call like this:
(sampleloadSong "WRITE_API_KEY" "file.mp3")
Or is there any other better approach.
To prevent the repetition problem you describe, you can make a function that returns an api function that remembers the keys, (closes over them)
(defn make-client [key] (partial api key))
Then later in your program:
(let [api-fn (make-client WRITE_API_KEY)]
(api-fn :sample-song "song.mp3")
...
(api-fn :delete-song "other-song.mp3"))
Though many people consider it preferable to pass a config map as the first argument to each api call.
(api {:key WRITE_API_KEY} ....)
There is another common approach where people define the keys as a dynamically bindable symbol and require the callers to bind it appropriately:
(def *api-key* :unset)
(defn api .... use *api-key* )
from caller's namespace:
(binding [*api-key* WRITE_API_KEY]
(api :add-song "foo.mp3"))
This approach may be less popular than it used to be, and my personal preference it to pass a config map, though that is just my opinion.

Extending a library-provided protocol without impacting other users

I'm using a 3rd-party library (clj-msgpack), and wish to extend a protocol for a type which the library also provides a handler for.
On its own, this is simple enough -- but is there any way to do this which wouldn't impact other users of this library running inside the same JVM? Something similar to a dynamic var binding (only taking effect under a given point on the stack) would be ideal.
At present, I'm doing an unconditional override but using a dynamic var to enable my modified behavior; however, this feels far too much like monkey-patching for my comfort.
For the curious, the (admitted abomination) I'm putting into place follows:
(in-ns 'clj-msgpack.core)
(def ^:dynamic *keywordize-strings*
"Assume that any string starting with a colon should be unpacked to a keyword"
false)
(extend-protocol Unwrapable
RawValue
(unwrap [o]
(let [v (.getString o)]
(if (and *keywordize-strings* (.startsWith v ":"))
(keyword (.substring v 1))
v))))
After some thought I see two basic approches (one of which I get from you):
Dynamic binding (as you are doing it now):
Some complain that dynamic binding holds to the principal of most supprise; "what? is behaves this way only when called from there?". While I don't personally hold to this being a bad-thing(tm) some people do. In this case it exacly matches your desire and so long as you have one point where you decide if you want keywordized-strings this should work. If you add a second point that changes them back and a code path that crosses the two well... your on your own. But hey, working code has it's merits.
Inheritance:
good'ol java style or using clojure's add-hoc heirarchies you could extend the type of object you are passing around to be keywordized-string-widgewhatzit that extends widgewhatzit and add a new handler for your specific subclass. This only works in some cases and forces a different object style on the rest of the design. Some smart people will also argue that it still follows the principal of most surprise because the type of the objects will be different when called via another code path.
Personally I would go with your existing solution unless you can change your whole program to use keywords instead of strings (which would of course be my first (potentially controversial) choice)

Clojure: Perlis vs Protocols/Records [soft, philosophical]

Context:
(A) "It is better to have 100 functions operate on one data structure than 10 functions on 10 data structures." —Alan Perlis
(B) Clojure has defProtocol, defRecord, defType
Question:
is there some style of programming Clojure that gets the benefits of both?
(B) has the advantage of avoiding type errors.
(A) has the advantage of avoiding duplicate code.
Thanks
PS: I would love to hear constructive criticism on why I'm being downvoted + how to restructure the question to make it productive.
I am not sure how you can co-relate the (A) and (B).
(A) is about having consistency i.e if you use same data structure to represent your data (for ex: a user info stored in a map) across various layers of your application then it would make things consistent. If you use many data structure to represent the same info then you will have to write code to transform the structure from one form to another form and also the various functions which work on different structure will not be composable as they expect different data structure.
(B) This is about the various constructs in Clojure.
defprotocol : This is not about data structure rather it is about contract/interface i.e a particular type implements a contract and the type can be used in any context where the consumer function require the passed type to implement a contract. Ex: any type that can have can be printed to console (or other writable string) will implement the print contract/protocol.
defrecord : To create maps but with some additional interfaces implemented in a default way.
deftype: A low level construct to create types and hence you will have to write a lot of code for this. 99% of time you wont need to use this.
The way to reconcile this is to think "abstractions" rather than "data types". Or to paraphrase Alan Perlis:
"It is better to have 100 functions operate on one abstraction than
10 functions on 10 abstractions."
So the Clojure way is to:
Define your abstractions in a simple, minimal way (using defprotocol)
Write functions against this abstraction
Define concrete types that implement the abstraction using defprotocol, deftype etc. (or use extend-protocol to extend the protocol to existing Java classes if you like)