In Clojure, functions that modify their input conventionally end with an ! to warn the user. I even extended this: two !! means that the function has a side effect that modifies something that was not put in (i.e. global states that must persist across user-GUI interactions).
What is the convention for a function that is impure in that it uses an external state, such as file loading, etc?
There is no such convention. Impurity should be apparent from the functions name or docstring or purpose.
Examples:
get-settings-from-file
(get-settings :source :file)
load-configuration
import-data
Metatdata is a useful way to annotate functions that are impure. By convention, I've been labeling functions with an ^:impure marker thusly:
(defn ^:impure blech []
(mq/publish ...)
(spit ...)
(time/now ...)
(http/request ...)
(db/write-something ...))
I tend to only do this to bottom-layer functions. So if bar calls blech, it can be inferred that bar is also impure, and I don't have to pollute every impure function this way.
With this metadata in place, other tools (editors, reports, etc) can assist by indicating that a given function appears to have no impurities all the way down, and you can then see that it's a strong candidate for a unit test (vs those that will need mocking). You can also perform analytics to indicate how much of your code base is trending toward pure.
Related
I'm trying to advise a number of methods in one library with utility functions from another library, where some of the methods to be advised are defined with (defn) and some are defined with (defprotocol).
Right now I'm using this library, which uses (alter-var-root). I don't care which library I use (or whether I hand-roll my own).
The problem I'm running into right now is that protocol methods sometimes can be advised, and sometimes cannot, depending on factors that are not perfectly clear to me.
If I define a protocol, then define a type and implement that protocol in-line, then advising never seems to work. I am assuming this is because the type extends the JVM interface directly and skips the vars.
If, in a single namespace, I define a protocol, then advise its methods, and then extend the protocol to a type, the advising will not work.
If, in a single namespace, I define a protocol, then extend the protocol to a type, then advise the protocol's methods, the advising will work.
What I would like to do is find a method of advising that works reliably and does not rely on undefined implementation details. Is this possible?
Clojure itself doesn't provide any possibilities to advice functions in a reliable way, even those defined via def/defn. Consider the following example:
(require '[richelieu.core :as advice])
(advice/defadvice add-one [f x] (inc (f x)))
(defn func-1 [x] x)
(def func-2 func-1)
(advice/advise-var #'func-1 add-one)
> (func-1 0)
1
> (func-2 0)
0
After evaluation of the form (def func-2 func-1), var func-2 will contain binding of var func-1 (in other words its value), so advice-var won't affect it.
Eventhough, definitions like func-2 are rare, you may have noticed or used the following:
(defn generic-function [generic-parameter x y z]
...)
(def specific-function-1 (partial generic-function <specific-arg-1>))
(def specific-function-2 (partial generic-function <specific-arg-2>))
...
If you advice generic-function, none of specific functions will work as expected due to peculiarity described above.
If advising is critical for you, as a solution that may work, I'd suppose the following: since Clojure functions are compiled to java classes, you may try to replace java method invoke with other method that had desired behaviour (however, things become more complicated when talking about replacing protocol/interface methods: seems that you'll have to replace needed method in every class that implements particular protocol/interface).
Otherwise, you'll need explicit wrapper for every function that you want to advice. Macros may help to reduce boilerplate in this case.
In Erlang and while dealing with process, you have to export the function used in spawn function.
-module(echo).
-export([start/0, loop/0]).
start() ->
spawn(echo, loop, []).
The reason from the book "Programming Erlang, 2nd Edition. page 188" is
"Note that we also have to export the argument of spawn from the module. This is a good practice because we will be able to change the internal details of the server without changing the client code.".
And in the book "Erlang Programming", page 121:
-module(frequency).
-export([start/0, stop/0, allocate/0, deallocate/1]).
-export([init/0]).
%% These are the start functions used to create and
%% initialize the server.
start() ->
register(frequency, spawn(frequency, init, [])).
init() ->
Frequencies = {get_frequencies(), []},
loop(Frequencies).
Remember that when spawning a process, you have to export the init/ 0 function as it is used by the spawn/3 BIF. We have put this function in a separate export clause to distinguish it from the client functions, which are supposed to be called from other modules.
Would you please explain to me the logic behind that reason?
short answer is: spawn is not 'language construction' it's library function.
It means 'spawn' is situated in another module, which does not have access to any functions in your module but exported.
You have to pass to 'spawn' function some way to start your code. It can be function value (ie spawn(fun() -> (any code you want, including any local functions invocations) end) ) or module/exported function name/arguments, which is visible from other modules.
The logic is quite straightforward. Yet confusion can easily arise as:
export does not exactly match object-oriented encapsulation and especially public methods;
several common patterns require to export functions not meant to be called by regular clients.
What export really does
Export has a very strict meaning: exported functions are the only functions that can be referred to by their fully qualified name, i.e. by module, function name and arity.
For example:
-module(m).
-export([f/0]).
f() -> foo.
f(_Arg) -> bar.
g() -> foobar.
You can call the first function with an expression such as m:f() but this wouldn't work for the other two functions. m:f(ok) and m:g() will fail with an error.
For this reason, the compiler will warn in the example above that f/1 and g/0 are not called and cannot be called (they are unused).
Functions can always be called from outside a module: functions are values and you can refer to a local function (within a module), and pass this value outside. For example, you can spawn a new process by using a non-exported function, using spawn/1. You could rewrite your example as follows:
start() ->
spawn(fun loop/0).
This doesn't require to export loop. Joe Armstrong in other editions of Programming Erlang explicitely suggests to transform the code as above to avoid exporting loop/0.
Common patterns requiring an export
Because exports are the only way to refer to a function by name from outside a module, there are two common patterns that require exported functions even if those functions are not part of a public API.
The example you mention is whenever you want to call a library function that takes a MFA, i.e. a module, a function name and a list of arguments. These library functions will refer to the function by its fully qualified name. In addition to spawn/3, you might encounter timer:apply_after/4.
Likewise, you can write functions that take MFA arguments, and call the function using apply/3.
Sometimes, there are variants of these library functions that directly take a 0-arity function value. This is the case with spawn, as mentioned above. apply/1 doesn't make sense as you would simply write F().
The other common case is behavior callbacks, and especially OTP behaviors. In this case, you will need to export the callback functions which are of course referred to by name.
Good practice is to use separate export attributes for these functions to make it clear these functions are not part of the regular interface of the module.
Exports and code change
There is a third common case for using exports beyond a public API: code changes.
Imagine you are writing a loop (e.g. a server loop). You would typically implement this as following:
-module(m).
-export([start/0]).
start() -> spawn(fun() -> loop(state) end).
loop(State) ->
NewState = receive ...
...
end,
loop(NewState). % not updatable !
This code cannot be updated, as the loop will never exit the module. The proper way would be to export loop/1 and perform a fully qualified call:
-module(m).
-export([start/0]).
-export([loop/1]).
start() -> spawn(fun() -> loop(state) end).
loop(State) ->
NewState = receive ...
...
end,
?MODULE:loop(NewState).
Indeed, when you refer to an exported function using its fully qualified name, the lookup is always performed against the latest version of the module. So this trick allows to jump to the newer version of the code at every iteration of the loop. Code updates are actually quite complex, and OTP, with its behaviors, does it right for you. It typically uses the same construct.
Conversely, when you call a function passed as a value, this is always from the version of the module that created this value. Joe Armstrong argues this is an advantage of spawn/3 over spawn/1 in a dedicated section of his book (8.10, Spawning with MFAs). He writes:
Most programs we write use spawn(Fun) to create a new process. This is fine provided we don’t want to dynamically upgrade our code. Sometimes we want to write code that can be upgraded as we run it. If we want to make sure that our code can be dynamically upgraded, then we have to use a different form of spawn.
This is far-fetched as when you spawn a new process, it starts immediately, and an update is unlikely to occur between the start of the new process and the moment the function value is created. Besides, Armstrong's statement is partly untrue: to make sure the code can dynamically be upgraded, spawn/1 will work as well (cf example above), the trick is not to use spawn/3, but to perform a fully qualified call (Joe Armstrong describes this in another section). spawn/3 has other advantages over spawn/1.
Still, the difference between passing a function by value and by name explains why there is no version of timer:apply_after/4 that takes a function by value, since there is a delay and the function by value might be old when the timer fires. Such a variant would actually be dangerous because at most two versions of a module: the current one of the old one. If you reload a module more than once, processes trying to call even older versions of the code will be killed. For this reason, you would often prefer MFAs and their exports to function values.
When you do a spawn you create a new completely new process with its own environment and thread of execution. This means that you are no longer executing "inside" the module where the spawn is called, so you must make an "outside" call into the module. the only functions in a module which can be called from the "outside" are exported functions, hence the spawned function must be exported.
It might seem a little strange seeing you are spawning a function in the same module but this is why.
I think it is important to remember that a module is just code and does not contain any deeper meaning than that, for example like a class in an OO language. So even if you have functions from the same module being executed in different processes, a very common occurrence, then there is no implicit connection between them. You still have to send messages between processes even if it is from/to functions in the same module.
EDIT:
About the last part of your question with the quote about putting export init/1 in a separate export declaration. There is no need to do this and it has no semantic significance, you can use as many or as few export declarations as you wish. So you could put all the functions in one export declaration or have a separate one for each function; it makes no difference.
The reason to split them is purely visual and for documentation purposes. You typically group functions which go together into separate export declarations to make it easier to see that they are a group. You also typically put "internal" exported functions, functions which aren't meant for the user to directly call, in a separate export declaration. In this case init/1 has to be exported for the spawn but is not meant to be called directly outside the spawn.
By having the user call the start/0 function to start the server and not have them explicitly spawn the init/1 function allows you to change the internals as you wish later on. The user only sees the start/0 function. Which is what the first quote is trying to say.
If you're wondering why you have to export anything and not have everything visible by default, it's because it's clearer to the user which functions they should call if you hide all the ones they shouldn't. That way, if you change your mind on the implementation, people using your code won't notice. Otherwise, there may be someone who is using a function that you want to change or eliminate.
For example, say you have a module:
-module(somemod).
useful() ->
helper().
helper() ->
i_am_helping.
And you want to change it to:
-module(somemod).
useful() ->
betterhelper().
betterhelper() ->
i_am_helping_more.
If people should only be calling useful, you should be able to make this change. However, if everything was exported, people might be depending on helper when they shouldn't be. This change will break their code when it shouldn't.
When I define a private function in Clojure, I usually use a - prefix as a visual indicator that the function cannot be used outside of my namespace, e.g.
(defn- -name []
(let [formatter (formatter "yyyy-MM-dd-HH-mm-ss-SSSS")]
(format "fixjure-%s" (unparse formatter (now)))))
But the - prefix seems to also be a convention for public methods when using gen-class.
Is there any generally accepted convention for defn-'d functions in the Clojure community, or should I simply use non-prefixed names?
It seems that lots of code in clojure.contrib (may it rest in peace) uses normal names for private functions, so maybe that is best, but I really like the visual indicator--maybe my C / Perl background is just too strong! ;)
There's not a convention; the visual indicator is prevalent in languages with no built-in notion of private functions. Since Clojure's functions defined with defn- are not visible outside their namespace, there is no need to prefix functions with an uglifier ;)
So do what you but, but you should probably want to just do as the rest of community does and just name them normally! It'll make your life easier.
I am unaware of any naming conventions but you can attach ^:private metadata tag for defining private functions. This is exactly equivalent to defn-, but is a little clearer, IMHO.
(defn ^:private foo [])
I am researching programming language design, and I am interested in the question of how to replace the popular single-dispatch message-passing OO paradigm with the multimethods generic-function paradigm. For the most part, it seems very straightforward, but I have recently become stuck and would appreciate some help.
Message-passing OO, in my mind, is one solution that solves two different problems. I explain what I mean in detail in the following pseudocode.
(1) It solves the dispatch problem:
=== in file animal.code ===
- Animals can "bark"
- Dogs "bark" by printing "woof" to the screen.
- Cats "bark" by printing "meow" to the screen.
=== in file myprogram.code ===
import animal.code
for each animal a in list-of-animals :
a.bark()
In this problem, "bark" is one method with multiple "branches" which operate differently depending upon the argument types. We implement "bark" once for each argument type we are interested in (Dogs and Cats). At runtime we are able to iterate through a list of animals and dynamically select the appropriate branch to take.
(2) It solves the namespace problem:
=== in file animal.code ===
- Animals can "bark"
=== in file tree.code ===
- Trees have "bark"
=== in file myprogram.code ===
import animal.code
import tree.code
a = new-dog()
a.bark() //Make the dog bark
…
t = new-tree()
b = t.bark() //Retrieve the bark from the tree
In this problem, "bark" is actually two conceptually different functions which just happen to have the same name. The type of the argument (whether dog or tree) determines which function we actually mean.
Multimethods elegantly solve problem number 1. But I don't understand how they solve problem number 2. For example, the first of the above two examples can be translated in a straightforward fashion to multimethods:
(1) Dogs and Cats using multimethods
=== in file animal.code ===
- define generic function bark(Animal a)
- define method bark(Dog d) : print("woof")
- define method bark(Cat c) : print("meow")
=== in file myprogram.code ===
import animal.code
for each animal a in list-of-animals :
bark(a)
The key point is that the method bark(Dog) is conceptually related to bark(Cat). The second example does not have this attribute, which is why I don't understand how multimethods solve the namespace issue.
(2) Why multimethods don't work for Animals and Trees
=== in file animal.code ===
- define generic function bark(Animal a)
=== in file tree.code ===
- define generic function bark(Tree t)
=== in file myprogram.code ===
import animal.code
import tree.code
a = new-dog()
bark(a) /// Which bark function are we calling?
t = new-tree
bark(t) /// Which bark function are we calling?
In this case, where should the generic function be defined? Should it be defined at the top-level, above both animal and tree? It doesn't make sense to think of bark for animal and tree as two methods of the same generic function because the two functions are conceptually different.
As far as I know, I haven't found any past work that's solved this problem yet. I have looked at Clojure multimethods, and CLOS multimethods and they have the same problem. I am crossing my fingers and hoping for either an elegant solution to the problem, or a persuading argument on why it's actually not a problem in real programming.
Please let me know if the question needs clarification. This is a fairly subtle (but important) point I think.
Thanks for the replies sanity, Rainer, Marcin, and Matthias. I understand your replies and completely agree that dynamic dispatch and namespace resolution are two different things. CLOS does not conflate the two ideas, whereas traditional message-passing OO does. This also allows for a straightforward extension of multimethods to multiple inheritance.
My question specifically is in the situation when the conflation is desirable.
The following is an example of what I mean.
=== file: XYZ.code ===
define class XYZ :
define get-x ()
define get-y ()
define get-z ()
=== file: POINT.code ===
define class POINT :
define get-x ()
define get-y ()
=== file: GENE.code ===
define class GENE :
define get-x ()
define get-xx ()
define get-y ()
define get-xy ()
==== file: my_program.code ===
import XYZ.code
import POINT.code
import GENE.code
obj = new-xyz()
obj.get-x()
pt = new-point()
pt.get-x()
gene = new-point()
gene.get-x()
Because of the conflation of namespace resolution with dispatch, the programmer can naively call get-x() on all three objects. This is also perfectly unambiguous. Each object "owns" its own set of methods, so there is no confusion as to what the programmer meant.
Contrast this to the multimethod version:
=== file: XYZ.code ===
define generic function get-x (XYZ)
define generic function get-y (XYZ)
define generic function get-z (XYZ)
=== file: POINT.code ===
define generic function get-x (POINT)
define generic function get-y (POINT)
=== file: GENE.code ===
define generic function get-x (GENE)
define generic function get-xx (GENE)
define generic function get-y (GENE)
define generic function get-xy (GENE)
==== file: my_program.code ===
import XYZ.code
import POINT.code
import GENE.code
obj = new-xyz()
XYZ:get-x(obj)
pt = new-point()
POINT:get-x(pt)
gene = new-point()
GENE:get-x(gene)
Because get-x() of XYZ has no conceptual relation to get-x() of GENE, they are implemented as separate generic functions. Hence, the end programmer (in my_program.code) must explicitly qualify get-x() and tell the system which get-x() he actually means to call.
It is true that this explicit approach is clearer and easily generalizable to multiple dispatch and multiple inheritance. But using (abusing) dispatch to solve namespace issues is an extremely convenient feature of message-passing OO.
I personally feel that 98% of my own code is adequately expressed using single-dispatch and single-inheritance. I use this convenience of using dispatch for namespace resolution much more so than I use multiple-dispatch, so I am reluctant to give it up.
Is there a way to get me the best of both worlds? How do I avoid the need to explicitly qualify my function calls in a multi-method setting?
It seems that the consensus is that
multimethods solve the dispatch problem but do not attack the namespace problem.
functions that are conceptually different should have different names, and users should be expected to manually qualify them.
I then believe that, in cases where single-inheritance single-dispatch is sufficient, message-passing OO is more convenient than generic functions.
This sounds like it is open research then. If a language were to provide a mechanism for multimethods that may also be used for namespace resolution, would that be a desired feature?
I like the concept of generic functions, but currently feel they are optimized for making "very hard things not so hard" at the expense of making "trivial things slightly annoying". Since the majority of code is trivial, I still believe this is a worthwhile problem to solve.
Dynamic dispatch and namespace resolution are two different things. In many object systems classes are also used for namespaces. Also note that often both the class and the namespace are tied to a file. So these object systems conflate at least three things:
class definitions with their slots and methods
the namespace for identifiers
the storage unit of source code
Common Lisp and its object system (CLOS) works differently:
classes don't form a namespace
generic functions and methods don't belong to classes and thus are not defined inside classes
generic functions are defined as top-level functions and thus are not nested or local
identifiers for generic functions are symbols
symbols have their own namespace mechanism called packages
generic functions are 'open'. One can add or delete methods at any time
generic functions are first-class objects
mathods are first-class objects
classes and generic functions are also not conflated with files. You can define multiple classes and multiple generic functions in one file or in as many files as you want. You can also define classes and methods from running code (thus not tied to files) or something like a REPL (read eval print loop).
Style in CLOS:
if a functionality needs dynamic dispatch and the functionality is closely related, then use one generic function with different methods
if there are many different functionalities, but with a common name, don't put them in the same generic function. Create different generic functions.
generic functions with the same name, but where the name is in different packages are different generic functions.
Example:
(defpackage "ANIMAL" (:use "CL"))
(in-package "ANIMAL")
(defclass animal () ())
(deflcass dog (animal) ())
(deflcass cat (animal) ()))
(defmethod bark ((an-animal dog)) (print 'woof))
(defmethod bark ((an-animal cat)) (print 'meow))
(bark (make-instance 'dog))
(bark (make-instance 'dog))
Note that the class ANIMAL and the package ANIMAL have the same name. But that is not necessary so. The names are not connected in any way.
The DEFMETHOD implicitly creates a corresponding generic function.
If you add another package (for example GAME-ANIMALS), then the BARK generic function will be different. Unless these packages are related (for example one package uses the other).
From a different package (symbol namespace in Common Lisp), one can call these:
(animal:bark some-animal)
(game-animal:bark some-game-animal)
A symbol has the syntax
PACKAGE-NAME::SYMBOL-NAME
If the package is the same as the current package, then it can be omitted.
ANIMAL::BARK refers to the symbol named BARK in the package ANIMAL. Note that there are two colons.
AINMAL:BARK refers to the exported symbol BARK in the package ANIMAL. Note that there is only one colon. Exporting, importing and using are mechanisms defined for packages and their symbols. Thus they are independent of classes and generic functions, but it can be used to structure the namespace for the symbols naming those.
The more interesting case is when multimethods are actually used in generic functions:
(defmethod bite ((some-animal cat) (some-human human))
...)
(defmethod bite ((some-animal dog) (some-food bone))
...)
Above uses the classes CAT, HUMAN, DOG and BONE. Which class should the generic function belong to? What would the special namespace look like?
Since generic functions dispatch over all arguments, it does not make direct sense to conflate the generic function with a special namespace and make it a definition in a single class.
Motivation:
Generic functions were added in the 80s to Lisp by developers at Xerox PARC (for Common LOOPS) and at Symbolics for New Flavors. One wanted to get rid of an additional calling mechanism (message passing) and bring dispatch to ordinary (top-level) functions. New Flavors had single dispatch, but generic functions with multiple arguments. The research into Common LOOPS then brought multiple dispatch. New Flavors and Common LOOPS were then replaced by the standardized CLOS. These ideas then were brought to other languages like Dylan.
Since the example code in the question does not use anything generic functions have to offer, it looks like one has to give up something.
When single dispatch, message passing and single inheritance is sufficient, then generic functions may look like a step back. The reason for this is, as mentioned, that one does not want to put all kinds of similar named functionality into one generic function.
When
(defmethod bark ((some-animal dog)) ...)
(defmethod bark ((some-tree oak)) ...)
look similar, they are two conceptually different actions.
But more:
(defmethod bark ((some-animal dog) tone loudness duration)
...)
(defmethod bark ((some-tree oak)) ...)
Now suddenly the parameter lists for the same named generic function looks different. Should that be allowed to be one generic function? If not, how do we call BARK on various objects in a list of things with the right parameters?
In real Lisp code generic functions usually look much more complicated with several required and optional arguments.
In Common Lisp generic functions also not only have a single method type. There are different types of methods and various ways to combine them. It makes only sense to combine them, when they really belong to a certain generic function.
Since generic functions are also first class objects, they can be passed around, returned from functions and stored in data structures. At this point the generic function object itself is important, not its name anymore.
For the simple case where I have an object, which has x and y coordinates and can act as a point, I would inherit for the objects's class from a POINT class (maybe as some mixin). Then I would import the GET-X and GET-Y symbols into some namespace - where necessary.
There are other languages which are more different from Lisp/CLOS and which attempt(ed) to support multimethods:
MultiJava
Runabout
C#
Fortress
There seems to be many attempts to add it to Java.
Your example for "Why multimethods won't work" presumes that you can define two identically-named generic functions in the same language namespace. This is generally not the case; for example, Clojure multimethods belong explicitly to a namespace, so if you have two such generic functions with the same name, you would need to make clear which you are using.
In short, functions that are "conceptually different" will always either have different names, or live in different namespaces.
Generic functions should perform the same "verb" for all classes its method is implemented for.
In the animals/tree "bark" case, the animal-verb is "perform a sound action" and in the tree case, well, I guess it's make-environment-shield.
That English happens to call them both "bark" is just a linguistic co-incidence.
If you have a case where multiple different GFs (generic functions) really should have the same name, using namespaces to separate them is (probably) the right thing.
Message-passing OO does not, in general, solve the namespacing problem that you talk about. OO languages with structural type systems don't differentiate between a method bark in an Animal or a Tree as long as they have the same type. It's only because popular OO languages use nominal type systems (e.g., Java) that it seems like that.
Because get-x() of XYZ has no conceptual relation to get-x() of GENE,
they are implemented as separate generic functions
Sure. But since their arglist is the same (just passing the object to the method), then you 'could' implement them as different methods on the same generic function.
The only constraint when adding a method to a generic function, is that the arglist of the method matches the arglist of the generic function.
More generally, methods must have the same number of required and
optional parameters and must be capable of accepting any arguments
corresponding to any &rest or &key parameters specified by the generic
function.
There's no constraint that the functions must be conceptually related. Most of the time they are (overriding a superclass, etc.), but they certainly don't have to be.
Although even this constraint (need the same arglist) seems limiting at times. If you look at Erlang, functions have arity, and you can define multiple functions with the same name that have different arity (functions with same name and different arglists). And then a sort of dispatch takes care of calling the right function. I like this. And in lisp, I think this would map to having a generic function accept methods that have varying arglists. Maybe this is something that is configurable in the MOP?
Although reading a bit more here, it seems that keyword arguments might allow the programmer to achieve having a generic function encapsulate methods with completely different arity, by using different keys in different methods to vary their number of arguments:
A method can "accept" &key and &rest arguments defined in its generic
function by having a &rest parameter, by having the same &key
parameters, or by specifying &allow-other-keys along with &key. A
method can also specify &key parameters not found in the generic
function's parameter list--when the generic function is called, any
&key parameter specified by the generic function or any applicable
method will be accepted.
Also note that this sort of blurring, where different methods stored in the generic function do conceptually different things, happens behind the scenes in your 'tree has bark', 'dogs bark' example. When defining the tree class, you'd set an automatic getter and setter method for the bark slot. When defining the dog class, you'd define a bark method on the dog type that actually does the barking. And both of these methods get stored in a #'bark generic function.
Since they are both enclosed in the same generic function, you'd call them in exactly the same way:
(bark tree-obj) -> Returns a noun (the bark of the tree)
(bark dog-obj) -> Produces a verb (the dog barks)
As code:
CL-USER>
(defclass tree ()
((bark :accessor bark :initarg :bark :initform 'cracked)))
#<STANDARD-CLASS TREE>
CL-USER>
(symbol-function 'bark)
#<STANDARD-GENERIC-FUNCTION BARK (1)>
CL-USER>
(defclass dog ()
())
#<STANDARD-CLASS DOG>
CL-USER>
(defmethod bark ((obj dog))
'rough)
#<STANDARD-METHOD BARK (DOG) {1005494691}>
CL-USER>
(symbol-function 'bark)
#<STANDARD-GENERIC-FUNCTION BARK (2)>
CL-USER>
(bark (make-instance 'tree))
CRACKED
CL-USER>
(bark (make-instance 'dog))
ROUGH
CL-USER>
I tend to favor this sort of 'duality of syntax', or blurring of features, etc. And I do not think that all methods on a generic function have to be conceptually similar. That's just a guideline IMO. If a linguistic interaction in the English language happens (bark as noun and verb), it's nice to have a programming language that handles the case gracefully.
You are working with several concepts, and mixing them, like : namespaces, global generic functions, local generic functions (methods), method invocation, message passing, etc.
In some circumstances, those concepts may overlap sintactically, been difficult to implement. It seems to me that you are also mixing a lot of concepts in your mind.
Functional languages, are not my strength, I have done some work with LISP.
But, some of this concepts, are used in other paradigms, such as Procedural, & Object (Class) Orientation. You may want to check how this concepts are implemented, and, later, return to your own programming language.
For example, something that I consider very important its the use of namespace ( "modules" ), as a separate concept from Procedural Programming, and avoid identifier clashes, as the ones, you mention. A programming language with namespace like yours, would be like this:
=== in file animal.code ===
define module animals
define class animal
// methods doesn't use "bark(animal AANIMAL)"
define method bark()
...
end define method
end define class
define class dog
// methods doesn't use "bark(dog ADOG)"
define method bark()
...
end define method
end define class
end define module
=== in file myprogram.code ===
define module myprogram
import animals.code
import trees.code
define function main
a = new-dog()
a.bark() //Make the dog bark
…
t = new-tree()
b = t.bark() //Retrieve the bark from the tree
end define function main
end define module
Cheers.
This is the general question of where to put the dispatch table many programming languages are trying to address in a convenient way.
In case of OOP we put it into the class definition (we have type+function concretion this way, spiced with inheritance it gives all the pleasures of architecture issues).
In case of FP we put it inside the dispatching function (we have a shared centralized table, this is usually not that bad, but not perfect as well).
I like interface-based approach, when I can create the virtual table separately of any data type AND of any shared function definition (protocol in Clojure).
In Java (sorry) it will look like this:
Let's assume ResponseBody is an interface.
public static ResponseBody create(MediaType contentType,
long contentLength, InputStream content) {
return new ResponseBody() {
public MediaType contentType() {
return contentType;
}
public long contentLength() {
return contentLength;
}
public BufferedSource source() {
return streamBuffered(content);
}
};
}
The virtual table gets created for this specific create function. This completely solves namespace problem, you can also have a non-centralized type-based dispatch (OOP) if you want to.
It is also becomes trivial to have a separate implementation without declaring new data types for testing purposes.
After defining a record and the interfaces it implements, I can call its methods either by its name or using the java interop way using the dot operator.
user=> (defprotocol Eat (eat [this]))
Eat
user=> (defrecord animal [name] Eat (eat [this] "eating"))
user.animal
user=> (eat (animal. "bob"))
"eating"
user=> (.eat (animal. "bob"))
"eating"
user=>
Under the surface, what is going on there? Are there new clojure functions being defined? What happens when there are functions you defined that share the same name (is this possible?), how are these ambiguities resolved?
Also, is it possible to "import" java methods for other java objects so that you do not need the . operator so that behavior is like above? (For the purpose, for example, of unifying the user interface)
When you define a protocol, each of its methods are created as functions in your current namespaces. It follows that you can't have two protocols defining the same function in the same namespace. It also means that you can have them in separate namespaces and that a given type can extend both[1] of them without any nameclash because they are namespaced (in opposition to Java where a single class can't implement two interfaces with homonymous methods).
From a user perspective, protocol methods are no different from plain old non-polymorphic functions.
The fact that you can call a protocol method using interop is an implementation detail. The reason for that is that for each protocol, the Clojure compiler creates a corresponding backing interface. Later on when you define a new type with inline protocol extensions, then this type will implement these protocols' backing interfaces.
Consequently you can't use the interop form on an object for which the extension hasn't been provided inline:
(defrecord VacuumCleaner [brand model]
(extend-protocol Eat
VacuumCleaner
(eat [this] "eating legos and socks"))
(.eat (VaacumCleaner. "Dyson" "DC-20"))
; method not found exception
The compiler has special support for protocol functions so they are compiled as an instance check followed by a virtual method call, so when applicable (eat ...) will be as fast as (.eat ...).
To reply to "can one import java methods", you can wrap them in regular fns:
(def callme #(.callme %1 %2 %3))
(obviously you may need to add other arities to account for overloads and type hints to remove reflection)
[1] however you can't extend both inline (at least one of them must be in a extend-* form), because of an implementation limitation