I am researching programming language design, and I am interested in the question of how to replace the popular single-dispatch message-passing OO paradigm with the multimethods generic-function paradigm. For the most part, it seems very straightforward, but I have recently become stuck and would appreciate some help.
Message-passing OO, in my mind, is one solution that solves two different problems. I explain what I mean in detail in the following pseudocode.
(1) It solves the dispatch problem:
=== in file animal.code ===
- Animals can "bark"
- Dogs "bark" by printing "woof" to the screen.
- Cats "bark" by printing "meow" to the screen.
=== in file myprogram.code ===
import animal.code
for each animal a in list-of-animals :
a.bark()
In this problem, "bark" is one method with multiple "branches" which operate differently depending upon the argument types. We implement "bark" once for each argument type we are interested in (Dogs and Cats). At runtime we are able to iterate through a list of animals and dynamically select the appropriate branch to take.
(2) It solves the namespace problem:
=== in file animal.code ===
- Animals can "bark"
=== in file tree.code ===
- Trees have "bark"
=== in file myprogram.code ===
import animal.code
import tree.code
a = new-dog()
a.bark() //Make the dog bark
…
t = new-tree()
b = t.bark() //Retrieve the bark from the tree
In this problem, "bark" is actually two conceptually different functions which just happen to have the same name. The type of the argument (whether dog or tree) determines which function we actually mean.
Multimethods elegantly solve problem number 1. But I don't understand how they solve problem number 2. For example, the first of the above two examples can be translated in a straightforward fashion to multimethods:
(1) Dogs and Cats using multimethods
=== in file animal.code ===
- define generic function bark(Animal a)
- define method bark(Dog d) : print("woof")
- define method bark(Cat c) : print("meow")
=== in file myprogram.code ===
import animal.code
for each animal a in list-of-animals :
bark(a)
The key point is that the method bark(Dog) is conceptually related to bark(Cat). The second example does not have this attribute, which is why I don't understand how multimethods solve the namespace issue.
(2) Why multimethods don't work for Animals and Trees
=== in file animal.code ===
- define generic function bark(Animal a)
=== in file tree.code ===
- define generic function bark(Tree t)
=== in file myprogram.code ===
import animal.code
import tree.code
a = new-dog()
bark(a) /// Which bark function are we calling?
t = new-tree
bark(t) /// Which bark function are we calling?
In this case, where should the generic function be defined? Should it be defined at the top-level, above both animal and tree? It doesn't make sense to think of bark for animal and tree as two methods of the same generic function because the two functions are conceptually different.
As far as I know, I haven't found any past work that's solved this problem yet. I have looked at Clojure multimethods, and CLOS multimethods and they have the same problem. I am crossing my fingers and hoping for either an elegant solution to the problem, or a persuading argument on why it's actually not a problem in real programming.
Please let me know if the question needs clarification. This is a fairly subtle (but important) point I think.
Thanks for the replies sanity, Rainer, Marcin, and Matthias. I understand your replies and completely agree that dynamic dispatch and namespace resolution are two different things. CLOS does not conflate the two ideas, whereas traditional message-passing OO does. This also allows for a straightforward extension of multimethods to multiple inheritance.
My question specifically is in the situation when the conflation is desirable.
The following is an example of what I mean.
=== file: XYZ.code ===
define class XYZ :
define get-x ()
define get-y ()
define get-z ()
=== file: POINT.code ===
define class POINT :
define get-x ()
define get-y ()
=== file: GENE.code ===
define class GENE :
define get-x ()
define get-xx ()
define get-y ()
define get-xy ()
==== file: my_program.code ===
import XYZ.code
import POINT.code
import GENE.code
obj = new-xyz()
obj.get-x()
pt = new-point()
pt.get-x()
gene = new-point()
gene.get-x()
Because of the conflation of namespace resolution with dispatch, the programmer can naively call get-x() on all three objects. This is also perfectly unambiguous. Each object "owns" its own set of methods, so there is no confusion as to what the programmer meant.
Contrast this to the multimethod version:
=== file: XYZ.code ===
define generic function get-x (XYZ)
define generic function get-y (XYZ)
define generic function get-z (XYZ)
=== file: POINT.code ===
define generic function get-x (POINT)
define generic function get-y (POINT)
=== file: GENE.code ===
define generic function get-x (GENE)
define generic function get-xx (GENE)
define generic function get-y (GENE)
define generic function get-xy (GENE)
==== file: my_program.code ===
import XYZ.code
import POINT.code
import GENE.code
obj = new-xyz()
XYZ:get-x(obj)
pt = new-point()
POINT:get-x(pt)
gene = new-point()
GENE:get-x(gene)
Because get-x() of XYZ has no conceptual relation to get-x() of GENE, they are implemented as separate generic functions. Hence, the end programmer (in my_program.code) must explicitly qualify get-x() and tell the system which get-x() he actually means to call.
It is true that this explicit approach is clearer and easily generalizable to multiple dispatch and multiple inheritance. But using (abusing) dispatch to solve namespace issues is an extremely convenient feature of message-passing OO.
I personally feel that 98% of my own code is adequately expressed using single-dispatch and single-inheritance. I use this convenience of using dispatch for namespace resolution much more so than I use multiple-dispatch, so I am reluctant to give it up.
Is there a way to get me the best of both worlds? How do I avoid the need to explicitly qualify my function calls in a multi-method setting?
It seems that the consensus is that
multimethods solve the dispatch problem but do not attack the namespace problem.
functions that are conceptually different should have different names, and users should be expected to manually qualify them.
I then believe that, in cases where single-inheritance single-dispatch is sufficient, message-passing OO is more convenient than generic functions.
This sounds like it is open research then. If a language were to provide a mechanism for multimethods that may also be used for namespace resolution, would that be a desired feature?
I like the concept of generic functions, but currently feel they are optimized for making "very hard things not so hard" at the expense of making "trivial things slightly annoying". Since the majority of code is trivial, I still believe this is a worthwhile problem to solve.
Dynamic dispatch and namespace resolution are two different things. In many object systems classes are also used for namespaces. Also note that often both the class and the namespace are tied to a file. So these object systems conflate at least three things:
class definitions with their slots and methods
the namespace for identifiers
the storage unit of source code
Common Lisp and its object system (CLOS) works differently:
classes don't form a namespace
generic functions and methods don't belong to classes and thus are not defined inside classes
generic functions are defined as top-level functions and thus are not nested or local
identifiers for generic functions are symbols
symbols have their own namespace mechanism called packages
generic functions are 'open'. One can add or delete methods at any time
generic functions are first-class objects
mathods are first-class objects
classes and generic functions are also not conflated with files. You can define multiple classes and multiple generic functions in one file or in as many files as you want. You can also define classes and methods from running code (thus not tied to files) or something like a REPL (read eval print loop).
Style in CLOS:
if a functionality needs dynamic dispatch and the functionality is closely related, then use one generic function with different methods
if there are many different functionalities, but with a common name, don't put them in the same generic function. Create different generic functions.
generic functions with the same name, but where the name is in different packages are different generic functions.
Example:
(defpackage "ANIMAL" (:use "CL"))
(in-package "ANIMAL")
(defclass animal () ())
(deflcass dog (animal) ())
(deflcass cat (animal) ()))
(defmethod bark ((an-animal dog)) (print 'woof))
(defmethod bark ((an-animal cat)) (print 'meow))
(bark (make-instance 'dog))
(bark (make-instance 'dog))
Note that the class ANIMAL and the package ANIMAL have the same name. But that is not necessary so. The names are not connected in any way.
The DEFMETHOD implicitly creates a corresponding generic function.
If you add another package (for example GAME-ANIMALS), then the BARK generic function will be different. Unless these packages are related (for example one package uses the other).
From a different package (symbol namespace in Common Lisp), one can call these:
(animal:bark some-animal)
(game-animal:bark some-game-animal)
A symbol has the syntax
PACKAGE-NAME::SYMBOL-NAME
If the package is the same as the current package, then it can be omitted.
ANIMAL::BARK refers to the symbol named BARK in the package ANIMAL. Note that there are two colons.
AINMAL:BARK refers to the exported symbol BARK in the package ANIMAL. Note that there is only one colon. Exporting, importing and using are mechanisms defined for packages and their symbols. Thus they are independent of classes and generic functions, but it can be used to structure the namespace for the symbols naming those.
The more interesting case is when multimethods are actually used in generic functions:
(defmethod bite ((some-animal cat) (some-human human))
...)
(defmethod bite ((some-animal dog) (some-food bone))
...)
Above uses the classes CAT, HUMAN, DOG and BONE. Which class should the generic function belong to? What would the special namespace look like?
Since generic functions dispatch over all arguments, it does not make direct sense to conflate the generic function with a special namespace and make it a definition in a single class.
Motivation:
Generic functions were added in the 80s to Lisp by developers at Xerox PARC (for Common LOOPS) and at Symbolics for New Flavors. One wanted to get rid of an additional calling mechanism (message passing) and bring dispatch to ordinary (top-level) functions. New Flavors had single dispatch, but generic functions with multiple arguments. The research into Common LOOPS then brought multiple dispatch. New Flavors and Common LOOPS were then replaced by the standardized CLOS. These ideas then were brought to other languages like Dylan.
Since the example code in the question does not use anything generic functions have to offer, it looks like one has to give up something.
When single dispatch, message passing and single inheritance is sufficient, then generic functions may look like a step back. The reason for this is, as mentioned, that one does not want to put all kinds of similar named functionality into one generic function.
When
(defmethod bark ((some-animal dog)) ...)
(defmethod bark ((some-tree oak)) ...)
look similar, they are two conceptually different actions.
But more:
(defmethod bark ((some-animal dog) tone loudness duration)
...)
(defmethod bark ((some-tree oak)) ...)
Now suddenly the parameter lists for the same named generic function looks different. Should that be allowed to be one generic function? If not, how do we call BARK on various objects in a list of things with the right parameters?
In real Lisp code generic functions usually look much more complicated with several required and optional arguments.
In Common Lisp generic functions also not only have a single method type. There are different types of methods and various ways to combine them. It makes only sense to combine them, when they really belong to a certain generic function.
Since generic functions are also first class objects, they can be passed around, returned from functions and stored in data structures. At this point the generic function object itself is important, not its name anymore.
For the simple case where I have an object, which has x and y coordinates and can act as a point, I would inherit for the objects's class from a POINT class (maybe as some mixin). Then I would import the GET-X and GET-Y symbols into some namespace - where necessary.
There are other languages which are more different from Lisp/CLOS and which attempt(ed) to support multimethods:
MultiJava
Runabout
C#
Fortress
There seems to be many attempts to add it to Java.
Your example for "Why multimethods won't work" presumes that you can define two identically-named generic functions in the same language namespace. This is generally not the case; for example, Clojure multimethods belong explicitly to a namespace, so if you have two such generic functions with the same name, you would need to make clear which you are using.
In short, functions that are "conceptually different" will always either have different names, or live in different namespaces.
Generic functions should perform the same "verb" for all classes its method is implemented for.
In the animals/tree "bark" case, the animal-verb is "perform a sound action" and in the tree case, well, I guess it's make-environment-shield.
That English happens to call them both "bark" is just a linguistic co-incidence.
If you have a case where multiple different GFs (generic functions) really should have the same name, using namespaces to separate them is (probably) the right thing.
Message-passing OO does not, in general, solve the namespacing problem that you talk about. OO languages with structural type systems don't differentiate between a method bark in an Animal or a Tree as long as they have the same type. It's only because popular OO languages use nominal type systems (e.g., Java) that it seems like that.
Because get-x() of XYZ has no conceptual relation to get-x() of GENE,
they are implemented as separate generic functions
Sure. But since their arglist is the same (just passing the object to the method), then you 'could' implement them as different methods on the same generic function.
The only constraint when adding a method to a generic function, is that the arglist of the method matches the arglist of the generic function.
More generally, methods must have the same number of required and
optional parameters and must be capable of accepting any arguments
corresponding to any &rest or &key parameters specified by the generic
function.
There's no constraint that the functions must be conceptually related. Most of the time they are (overriding a superclass, etc.), but they certainly don't have to be.
Although even this constraint (need the same arglist) seems limiting at times. If you look at Erlang, functions have arity, and you can define multiple functions with the same name that have different arity (functions with same name and different arglists). And then a sort of dispatch takes care of calling the right function. I like this. And in lisp, I think this would map to having a generic function accept methods that have varying arglists. Maybe this is something that is configurable in the MOP?
Although reading a bit more here, it seems that keyword arguments might allow the programmer to achieve having a generic function encapsulate methods with completely different arity, by using different keys in different methods to vary their number of arguments:
A method can "accept" &key and &rest arguments defined in its generic
function by having a &rest parameter, by having the same &key
parameters, or by specifying &allow-other-keys along with &key. A
method can also specify &key parameters not found in the generic
function's parameter list--when the generic function is called, any
&key parameter specified by the generic function or any applicable
method will be accepted.
Also note that this sort of blurring, where different methods stored in the generic function do conceptually different things, happens behind the scenes in your 'tree has bark', 'dogs bark' example. When defining the tree class, you'd set an automatic getter and setter method for the bark slot. When defining the dog class, you'd define a bark method on the dog type that actually does the barking. And both of these methods get stored in a #'bark generic function.
Since they are both enclosed in the same generic function, you'd call them in exactly the same way:
(bark tree-obj) -> Returns a noun (the bark of the tree)
(bark dog-obj) -> Produces a verb (the dog barks)
As code:
CL-USER>
(defclass tree ()
((bark :accessor bark :initarg :bark :initform 'cracked)))
#<STANDARD-CLASS TREE>
CL-USER>
(symbol-function 'bark)
#<STANDARD-GENERIC-FUNCTION BARK (1)>
CL-USER>
(defclass dog ()
())
#<STANDARD-CLASS DOG>
CL-USER>
(defmethod bark ((obj dog))
'rough)
#<STANDARD-METHOD BARK (DOG) {1005494691}>
CL-USER>
(symbol-function 'bark)
#<STANDARD-GENERIC-FUNCTION BARK (2)>
CL-USER>
(bark (make-instance 'tree))
CRACKED
CL-USER>
(bark (make-instance 'dog))
ROUGH
CL-USER>
I tend to favor this sort of 'duality of syntax', or blurring of features, etc. And I do not think that all methods on a generic function have to be conceptually similar. That's just a guideline IMO. If a linguistic interaction in the English language happens (bark as noun and verb), it's nice to have a programming language that handles the case gracefully.
You are working with several concepts, and mixing them, like : namespaces, global generic functions, local generic functions (methods), method invocation, message passing, etc.
In some circumstances, those concepts may overlap sintactically, been difficult to implement. It seems to me that you are also mixing a lot of concepts in your mind.
Functional languages, are not my strength, I have done some work with LISP.
But, some of this concepts, are used in other paradigms, such as Procedural, & Object (Class) Orientation. You may want to check how this concepts are implemented, and, later, return to your own programming language.
For example, something that I consider very important its the use of namespace ( "modules" ), as a separate concept from Procedural Programming, and avoid identifier clashes, as the ones, you mention. A programming language with namespace like yours, would be like this:
=== in file animal.code ===
define module animals
define class animal
// methods doesn't use "bark(animal AANIMAL)"
define method bark()
...
end define method
end define class
define class dog
// methods doesn't use "bark(dog ADOG)"
define method bark()
...
end define method
end define class
end define module
=== in file myprogram.code ===
define module myprogram
import animals.code
import trees.code
define function main
a = new-dog()
a.bark() //Make the dog bark
…
t = new-tree()
b = t.bark() //Retrieve the bark from the tree
end define function main
end define module
Cheers.
This is the general question of where to put the dispatch table many programming languages are trying to address in a convenient way.
In case of OOP we put it into the class definition (we have type+function concretion this way, spiced with inheritance it gives all the pleasures of architecture issues).
In case of FP we put it inside the dispatching function (we have a shared centralized table, this is usually not that bad, but not perfect as well).
I like interface-based approach, when I can create the virtual table separately of any data type AND of any shared function definition (protocol in Clojure).
In Java (sorry) it will look like this:
Let's assume ResponseBody is an interface.
public static ResponseBody create(MediaType contentType,
long contentLength, InputStream content) {
return new ResponseBody() {
public MediaType contentType() {
return contentType;
}
public long contentLength() {
return contentLength;
}
public BufferedSource source() {
return streamBuffered(content);
}
};
}
The virtual table gets created for this specific create function. This completely solves namespace problem, you can also have a non-centralized type-based dispatch (OOP) if you want to.
It is also becomes trivial to have a separate implementation without declaring new data types for testing purposes.
Related
I'm a beginner programmer (who has a bunch of design-related scripting experience for video games but very little programming experience - so just basic stuff like loops, flow control, etc. - although I do have a C++ fundamentals and C++ data structures and algorithm's course under my belt). I'm working on a text-adventure personal project (I actually already wrote it in Python ages ago before I learned how classes work - everything is a dictionary - so it's shameful). I'm "remaking" it in C++ with classes to get out of the rut of having only done homework assignments.
I've written my player and room classes (which were simple since I only need one class for each). I'm onto item classes (an item being anything in a room, such as a torch, a fire, a sign, a container, etc.). I'm unsure how to approach the item base class and derived classes. Here are the problems I'm having.
How do I tell whether an item is of a certain type in a non-shit way (there's a good chance I'm overthinking this)?
For example, I set up my print room info function so that in addition to whatever else it might do, it prints the name of every object in its inventory (i.e. inside of it) and I want it to print something special for a container object (the contents of its inventory for example).
The first part's easy because every item has a name since the name attribute is part of the base item class. The container has an inventory though, which is an attribute unique to the container subclass.
It's my understanding that it's bad form to execute conditional logic based on the object's class type (because one's classes should be polymorphic) and I'm assuming (perhaps incorrectly) that it'd be weird and wrong to put a getHasInventory accessor virtual function in the item base class (my assumption here is based on thinking it'd be crazy to put virtual functions for every derived class in the base class - I have about a dozen derived classes - a couple of which are derived classes of derived classes).
If that's all correct, what's an acceptable way to do this? One obvious thing is to add an itemType attribute to the base and then do conditional logic but this strikes me as wrong since it seems to just be a re-skinning of the checking class type solution. I'm unsure whether the above-mentioned assumptions are correct and what a good solution might be.
How should I structure my base class/classes and my derived classes?
I originally wrote them such that the item class was the base class and most other classes used single inheritance (except for a couple which had multi-level).
This seemed to present some awkwardness and repeating myself though. For example, I want a sign and a letter. A sign is a Readable Item > Untakeable Item > Item. A letter is a Readable Item > Takeable Item > Item. Because they all use single inheritance I need two different Readable Items, one that's takeable and one that's not (I know I could just make takeable and untakeable into attributes of the base in this instance and I did but this works as an example because I still have similar issues with other classes).
That seems icky to me so I took another stab at it and implemented them all using multiple inheritance & virtual inheritance. In my case that seems more flexible because I can compose classes of multiple classes and create a kind of component system for my classes.
Is one of these ways better than the other? Is there some third way that's better?
One possible way to solve your problem is polymorphism. By using polymorphism you can (for example) have a single describe function which when invoked leads the item to describe itself to the player. You can do the same for use, and other common verbs.
Another way is to implement a more advanced input parser, which can recognize objects and pass on the verbs to some (polymorphic) function of the items for themselves to handle. For example each item could have a function returning a list of available verbs, together with a function returning a list of "names" for the items:
struct item
{
// Return a list of verbs this item reacts to
virtual std::vector<std::string> get_verbs() = 0;
// Return a list of name aliases for this item
virtual std::vector<std::string> get_names() = 0;
// Describe this items to the player
virtual void describe(player*) = 0;
// Perform a specific verb, input is the full input line
virtual void perform_verb(std::string verb, std::string input) = 0;
};
class base_torch : public item
{
public:
std::vector<std::string> get_verbs() override
{
return { "light", "extinguish" };
}
// Return true if the torch is lit, false otherwise
bool is_lit();
void perform_verb(std::string verb, std::string) override
{
if (verb == "light")
{
// TODO: Make the torch "lit"
}
else
{
// TODO: Make the torch "extinguished"
}
}
};
class long_brown_torch : public base_torch
{
std::vector<std::string> get_names() override
{
return { "long brown torch", "long torch", "brown torch", "torch" };
}
void describe(player* p) override
{
p->write("This is a long brown torch.");
if (is_lit())
p->write("The torch is burning.");
}
};
Then if the player input e.g. light brown torch the parser looks through all available items (the ones in the players inventory followed by the items in the room), get each items name-list (call the items get_names() function) and compare it to the brown torch. If a match is found the parser calls the items perform_verb function passing the appropriate arguments (item->perform_verb("light", "light brown torch")).
You can even modify the parser (and the items) to handle adjectives separately, or even articles like the, or save the last used item so it can be referenced by using it.
Constructing the different rooms and items is tedious but still trivial once a good design has been made (and you really should spend some time creating requirement, analysis of the requirements, and creating a design). The really hard part is writing a decent parser.
Note that this is only two possible ways to handle items and verbs in such a game. There are many other ways, to many to list them all.
You are asking some excellent questions reg. how to design, structure and implement the program, as well as how to model the problem domain.
OOP, 'methods' and approaches
The questions you ask indicate that you have learned about OOP (object-oriented programming). In a lot of introductory material on OOP, it is common to encourage modelling the problem domain directly through objects and subtyping and implementing functionality by adding methods to them. A classical example is modelling animals, with for instance an Animal type and two sub-types, Duck and Cat, and implementing functionality, for instance walk, quack and mew.
Modelling the problem domain directly with objects and subtyping can make sense, but it can also very much be overkill and bothersome compared to simply having a single or a few types with different fields describing what it is. In your case, I do believe a more complex modelling like you have with objects and subtypes or alternative approaches can make sense, since among other aspects you have functionality that varies depending on the type as well as somewhat complex data (like a container with an inventory). But it is something to keep in mind - there are different trade-offs, and sometimes, having a single type with multiple different fields for modelling the domain can make more sense overall.
Implementing the desired functionality through methods on a base class and subtypes likewise have different trade-offs, and it is not always a good approach for the given case. For one of your questions, you could do something like adding a print method or similar to the base type and each subtype, but this is not always that nice in practice (a simple example is that of a calculator application where simplifying the arithmetic expression the user enters (like (3*x)*4/2) might be bothersome to implement if one uses the approach of adding methods to the base class).
Alternative approach - Tagged unions/sum types
There is a very nice fundamental abstraction known as "tagged union" (it is also known by the names "disjoint union" and "sum type"). The main idea about the tagged union is that you have a union of several different sets of instances, where which set the given instance belongs to matters. They are a superset of the feature in C++ known as enum. Regrettably, C++ does not currently support tagged unions, though there are research into it (for instance https://www.stroustrup.com/OpenPatternMatching.pdf , though this may be somewhat beyond you if you are a beginner programmer). As far as I can see, this fits very well with the example you have given here. An example in Scala would be (many other languages support tagged unions as well, such as Rust, Kotlin, Typescript, the ML-languages, Haskell, etc.):
sealed trait Item {
val name: String
}
case class Book(val name: String) extends Item
case object Fire extends Item {
val name = "Fire"
}
case class Container(val name: String, val inventory: List[Item]) extends Item
This describes your different kinds of items very well as far as I can see. Do note that Scala is a bit special in this regard, since it implements tagged unions through subtyping.
If you then wanted to implement some print functionality, you could then use "pattern matching" to match which item you have and do functionality specific to that item. In languages that support pattern matching, this is convenient and non-fragile, since the pattern matching checks that you have covered each possible case (similar to switch in C++ over enums checking that you have covered each possible case). For instance in Scala:
def getDescription(item: Item): String = {
item match {
case Book(_) | Fire => item.name
case Container(name, inventory) =>
name + " contains: (" +
inventory
.map(getDescription(_))
.mkString(", ") +
")"
}
}
val description = getDescription(
Container("Bag", List(Book("On Spelunking"), Fire))
)
println(description)
You can copy-paste the two snippets in here and try to run them: https://scalafiddle.io/ .
This kind of modelling works very well with what one might call "data types", where you have no or very little functionality in the classes themselves, and where the fields inside the classes basically are part of their interface ("interface" in the sense that you would like to change the implementations that uses the types if you ever add to, remove or change the fields of the types).
Conversely, I find a more conventional subtyping modelling and approach more convenient when the implementation inside of a class is not part of its interface, for instance if I have a base type that describes a collision system interface, and each of its subtypes have different performance characteristics, handy for different situations. Hiding and protecting the implementation since it is not part of the interface makes a lot of sense and fits very well with what one might call "mini-modules".
In C++ (and C), sometimes people do use tagged unions despite the lack of language support, in various ways. One way that I have seen being used in C is to make a C union (though do be careful reg. aspects such as memory and semantics) where an enum tag was used to differentiate between the different cases. This is error-prone, since you might easily end up accessing a field in one enum case that is not valid for that enum case.
You could also model your command input as a tagged union. That said, parsing can be somewhat challenging, and parsing libraries may be a bit involved if you are a beginner programmer; keeping the parsing somewhat simple might be a good idea.
Side-notes
C++ is a special languages - I do not quite like it for cases where I do not care much about resource usage or runtime performance and the like for multiple different reasons, since it can be annoying and not that flexible to develop in. And it can be challenging to develop in, because you must always take great care to avoid undefined behaviour. That said, if resource usage or runtime performance do matter, C++ can, depending on case, be a very good option. There are also a number of very useful and important insights in the C++ language and its community, such as RAII, ownership and lifetimes. My recommendation is that learning C++ is a good idea, but that you should also learn other languages, maybe for instance a statically-typed functional programming language. FP (functional programming) and languages supporting FP, has a number of advantages and drawbacks, but some of their advantages are very, very nice, especially reg. immutability as well as side-effects.
Of these languages, Rust may be the closest to C++ in certain regards, though I don't have experience with Rust and cannot therefore vouch for either the language or its community.
As a side-note, you may be interested in this Wikipedia-page: https://en.wikipedia.org/wiki/Expression_problem .
While doing a study on the practical use of Inheritance concepts in C#, I encounted an interesting pattern of code. A non-generic interfaceI inherits from a generic type I<T> multiple times, each with a different type argument. The only reason I inherits from I<T> is for the purpose of declaring overloads, I<T> is never referenced anywhere in code, except for the inheritance relation. To illustrate:
interface Combined : Operations<Int32>, Operations<Int64>, Operations<double> {}
interface Operations<T> {
T Add(T left, T right);
T Multiply(T left, T right);
}
In practice, the IOperations interface has 30 methods with extensive XML documentation, so it seems logical to not want to repeat these declarations so many times. I googled for 'overload repeat design ', and 'method declaration reuse design pattern' etc but could not find any useful information.
Maybe this pattern has a more profound use in languages supporting multiple inheritance like C++, where the implementation of the operations could also be provided.
tl;dr: Can you name the design pattern in the above code example?
I don't think it has a name. The classic set of patterns were based largely on code in older Java and pre-standardization C++, neither of which supported parametric polymorphism (templates/generics), so patterns that require them don't really show up. As far as the GoF is concerned, that's just inheriting from three different interfaces.
It's also a little bit too ugly to qualify as a pattern. Why just those three types? Why not Int16, or Uint32? Why is the interface generic, rather than the methods?
One suggestion - could be Adapter pattern in the part of
A non-generic interfaceI inherits from a generic type I multiple times, each with a different type argument. The only reason I inherits from I is for the purpose of declaring overloads
I use it with classes too. It helps to convert the interface of a class into another interface, that is expect. Adapter lets classes work together that couldn't otherwise because of incompatible interfaces.
To be honest in your case I don't know what concept is implemented in the non-generic Interface I, but I suppose that is because when calling a generic method for storing an object there are occasionally needs to handle a specific type differently.
What are the differences between Module and Class in OCaml.
From my searching, I found this:
Both provide mechanisms for abstraction and encapsulation, for
subtyping (by omitting methods in objects, and omitting fields in
modules), and for inheritance (objects use inherit; modules use
include). However, the two systems are not comparable.
On the one hand, objects have an advantage: objects are first-class
values, and modules are not—in other words, modules do not support
dynamic lookup. On the other hand, modules have an advantage: modules
can contain type definitions, and objects cannot.
First, I don't understand what does "Modules do not support dynamic lookup" mean. From my part, abstraction and polymorphism do mean parent pointer can refer to a child instance. Is that the "dynamic lookup"? If not, what actually dynamic lookup means?
In practical, when do we choose to use Module and when Class?
The main difference between Module and Class is that you don't instantiate a module.
A module is basically just a "drawer" where you can put types, functions, other modules, etc... It is just here to order your code. This drawer is however really powerful thanks to functors.
A class, on the other hand, exists to be instantiated. They contains variables and methods. You can create an object from a class, and each object contains its own variable and methods (as defined in the class).
In practice, using a module will be a good solution most of the time. A class can be useful when you need inheritance (widgets for example).
From a practical perspective dynamic lookup lets you have different objects with the same method without specifying to which class/module it belongs. It helps you when using inheritance.
For example, let's use two data structures: SingleList and DoubleLinkedList, which, both, inherit from List and have the method pop. Each class has its own implementation of the method (because of the 'override').
So, when you want to call it, the lookup of the method is done at runtime (a.k.a. dynamically) when you do a list#pop.
If you were using modules you would have to use SingleList.pop list or DoubleLinkedList.pop list.
EDIT: As #Majestic12 said, most of the time, OCaml users tend to use modules over classes. Using the second when they need inheritance or instances (check his answer).
I wanted to make the description practical as you seem new to OCaml.
Hope it can help you.
Meyers mentioned in his book Effective C++ that in certain scenarios non-member non-friend functions are better encapsulated than member functions.
Example:
// Web browser allows to clear something
class WebBrowser {
public:
...
void clearCache();
void clearHistory();
void removeCookies();
...
};
Many users will want to perform all these actions together, so WebBrowser might also offer a function to do just that:
class WebBrowser {
public:
...
void clearEverything(); // calls clearCache, clearHistory, removeCookies
...
};
The other way is to define a non-member non-friend function.
void clearBrowser(WebBrowser& wb)
{
wb.clearCache();
wb.clearHistory();
wb.removeCookies();
}
The non-member function is better because "it doesn't increase the number of functions that can access the private parts of the class.", thus leading to better encapsulation.
Functions like clearBrowser are convenience functions because they can't offer any functionality a WebBrowser client couldn't already get in some other way. For example, if clearBrowser didn't exist, clients could just call clearCache, clearHistory, and removeCookies themselves.
To me, the example of convenience functions is reasonable. But is there any example other than convenience function when non-member version excels?
More generally, what are the rules of when to use which?
More generally, what are the rules of when to use which?
Here is what Scott Meyer's rules are (source):
Scott has an interesting article in print which advocates
that non-member non-friend functions improve encapsulation
for classes. He uses the following algorithm to determine
where a function f gets placed:
if (f needs to be virtual)
make f a member function of C;
else if (f is operator>> or operator<<)
{
make f a non-member function;
if (f needs access to non-public members of C)
make f a friend of C;
}
else if (f needs type conversions on its left-most argument)
{
make f a non-member function;
if (f needs access to non-public members of C)
make f a friend of C;
}
else if (f can be implemented via C's public interface)
make f a non-member function;
else
make f a member function of C;
His definition of encapsulation involves the number
of functions which are impacted when private data
members are changed.
Which pretty much sums it all up, and it is quite reasonable as well, in my opinion.
I often choose to build utility methods outside of my classes when they are application specific.
The application is usually in a different context then the engines doing the work underneath. If we take you example of a web browser, the 3 clear methods belongs to the web engine as this is needed functionality that would be difficult to implement anywhere else, however, the ClearEverything() is definitely more application specific. In this instance your application might have a small dialog that has a clear all button to help the user be more efficient. Maybe this is not something another application re-using your web browser engine would want to do and therefor having it in the engine class would just be more clutter.
Another example is a in a mathematic libraries. Often it make sense to have more advanced functionality like mean value or standard derivation implemented as part of a mathematical class. However, if you have an application specific way to calculate some type of mean that is not the standard version, it should probably be outside of your class and part of a utility library specific to you application.
I have never been a big fan of strong hardcoded rules to implement things in one way or another, it’s often a matter of ideology and principles.
M.
Non-member functions are commonly used when the developer of a library wants to write binary operators that can be overloaded on either argument with a class type, since if you make them a member of the class you can only overload on the second argument (the first is implicitly an object of that class). The various arithmetic operators for complex are perhaps the definitive example for this.
In the example you cite, the motivation is of another kind: use the least coupled design that still allows you to do the job.
This means that while clearEverything could (and, to be frank, would be quite likely to) be made a member, we don't make it one because it does not technically have to be. This buys you two things:
You don't have to accept the responsibility of having a clearEverything method in your public interface (once you ship with one, you 're married to it for life).
The number of functions with access to the private members of the class is one lower, hence any changes in the future will be easier to perform and less likely to cause bugs.
That said, in this particular example I feel that the concept is being taken too far, and for such an "innocent" function I 'd gravitate towards making it a member. But the concept is sound, and in "real world" scenarios where things are not so simple it would make much more sense.
Locality and allowing the class to provide 'enough' features while maintaining encapsulation are some things to consider.
If WebBrowser is reused in many places, the dependencies/clients may define multiple convenience functions. This keeps your classes (WebBrowser) lightweight and easy to manage.
The inverse would be that the WebBrowser ends up pleasing all clients, and just becomes some monolithic beast that is difficult to change.
Do you find the class is lacking functionality once it has been put to use in multiple scenarios? Do patterns emerge in your convenience functions? It's best (IMO) to defer formally extending the class's interface until patterns emerge and there is a good reason to add this functionality. A minimal class is easier to maintain, but you don't want redundant implementations all over the place because that pushes the maintenance burden onto your clients.
If your convenience functions are complex to implement, or there is a common case which can improve performance significantly (e.g. to empty a thread safe collection with one lock, rather than one element at a time with a lock each time), then you may also want to consider that case.
There will also be cases where you realize something is genuinely missing from the WebBrowser as you use it.
I've recently seen several people doing things like this here on Stackoverflow:
class A:
foo = 1
class B:
def blah(self):
pass
In other words, they have nested classes. This works (although people new to Python seem to run into problems because it doesn't behave like they thought it would), but I can't think of any reason to do this in any language at all, and certainly not in Python. Is there such a usecase? Why are people doing this? Searching for this it seems it's reasonably common in C++, is there a good reason there?
The main reason for putting one class in another is to avoid polluting the global namespace with things that are used only inside one class and therefore doesn't belong in the global namespace. This is applicable even to Python, with the global namespace being a namespace of a particular module. For example if you have SomeClass and OtherClass, and both of them need to read something in a specialized way, it is better to have SomeClass.Reader and OtherClass.Reader rather than SomeClassReader and OtherClassReader.
I have never encountered this in C++, though. It can be problematic to control access to the outer class' fields from a nested class. And it is also pretty common to have just one public class in a compilation unit defined in the header file and some utility classes defined in the CPP file (the Qt library is a great example of this). This way they aren't visible to "outsiders" which is good, so it doesn't make much sense to include them in the header. It also helps to increase binary compatibility which is otherwise a pain to maintain. Well, it's a pain anyway, but much less so.
A great example of a language where nested classes are really useful is Java. Nested classes there automatically have a pointer to the instance of the outer class that creates them (unless you declare the inner class as static). This way you don't need to pass "outer" to their constructors and you can address the outer class' fields just by their names.
It allows you to control the access of the nested class- for example, it's often used for implementation detail classes. In C++ it also has advantages in terms of when various things are parsed and what you can access without having to declare first.
I am not a big fan of python, but to me this type of decisions are more semantical than syntactical. If you are implementing a list, the class Node inside List is not a class in itself meant to be used from anywhere, but an implementation detail of the list. At the same time you can have a Node internal class inside Tree, or Graph. Whether the compiler/interpreter allows you to access the class or not is in a different thing. Programing is about writing specifications that the computer can follow and other programers can read, List.Node is more explicit in that Node is internal to List than having ListNode as a first level class.
In some languages, the nested class will have access to variables that are in scope within the outer class. (Similarly with functions, or with class-in-function nesting. Of course, function-in-class nesting just creates a method, which behaves fairly unsurprisingly. ;) )
In more technical terms, we create a closure.
Python lets you do a lot of things with functions (including lambdas) that in C++03 or Java you need a class for (although Java has anonymous inner classes, so a nested class doesn't always look like your example). Listeners, visitors, that kind of thing. A list comprehension is loosely a kind of visitor:
Python:
(foo(x) if x.f == target else bar(x) for x in bazes)
C++:
struct FooBar {
Sommat operator()(const Baz &x) const {
return (x.f == val) ? foo(x) : bar(x);
}
FooBar(int val) : val(val) {}
int val;
};
vector<Sommat> v(bazes.size());
std::transform(bazes.begin(), bazes.end(), v.begin(), FooBar(target));
The question that C++ and Java programmers then ask themselves is, "this little class that I'm writing: should it appear in the same scope as the big class that needs to use it, or should I confine it within the scope of the only class that uses it?"[*]
Since you don't want to publish the thing, or allow anyone else to rely on it, often the answer in these cases is a nested class. In Java, private classes can serve, and in C++ you can restrict classes to a TU, in which case you may no longer care too much what namespace scope the name appears in, so nested classes aren't actually required. It's just a style thing, plus Java provides some syntactic sugar.
As someone else said, another case is iterators in C++. Python can support iteration without an iterator class, but if you're writing a data structure in C++ or Java then you have to put the blighters somewhere. To follow the standard library container interface you'll have a nested typedef for it whether the class is nested or not, so it's fairly natural to think, "nested class".
[*] They also ask themselves, "should I just write a for loop?", but let's suppose a case where the answer to that is no...
In C++ at least, one major common use-case for nested classes is iterators in containers. For example, a hypothetical implementation might look something like this:
class list
{
public:
class iterator
{
// implementation code
};
class const_iterator
{
// implementation code
};
};
Another reason for nested classes in C++ would be private implementation details like node classes for maps, linked lists, etc.
"Nested classes" can mean two different things, which can be split into three different categories by intent. The first one is purely stylistic, the other two are used for practical purposes, and are highly dependent on the features language where they are used.
Nested class definitions for the sake of creating a new namespace and/or organizing your code better. For example, in Java this is accomplished through the use static nested classes, and it is suggested by the official documentation as a way to create more readable and maintainable code, and to logically group classes together. The Zen of Python, however, suggests that you nest code blocks less, thus discouraging this practice.
import this
In Python you'd much more often see the classes grouped in modules.
Putting a class inside another class as part of its interface (or the interface of the instances). First, this interface can be used by the implementation to aid subclassing, for example imagine a nested class HTML.Node which you can override in a subclass of HTML to alter the class used to create new node instances. Second, this interface might be used by the class/instance users, though this is not that useful unless you are in the third case described below.
In Python at least, you don't need to nest the definitions to achieve either of those, however, and it's probably very rare. Instead, you might see Node defined outside of the class and then node_factory = Node in the class definition (or a method dedicated to creating the nodes).
Nesting the namespace of the objects, or creating different contexts for different groups of objects. In Java, non-static nested classes (called inner classes) are bound to an instance of the outer class. This is very useful because it lets you have instances of the inner class that live inside different outer namespaces.
For Python, consider the decimal module. You can create different contexts, and have things like different precisions defined for each context. Each Decimal object can assigned a context on creation. This achieves the same as an inner class would, through a different mechanism. If Python supported inner classes, and Context and Decimal were nested, you'd have context.Decimal('3') instead of Decimal('3', context=context).
You could easily create a metaclass in Python that lets you create nested classes that live inside of an instance, you can even make it produce proper bound and unbound class proxies that support isinstance correctly through the use of __subclasscheck__ and __instancecheck__. However, it won't gain you anything over the other simpler ways to achieve the same (like an additional argument to __init__). It would only limit what you can do with it, and I have found inner classes in Java very confusing every time I had to use them.
In Python, a more useful pattern is declaration of a class inside a function or method. Declaration of a class in the body of another class, as people have noted in other answers, is of little use - yes, it does avoid pollution of the module namespace, but since there_is_ a module namespace at all, a few more names on it do not bother. Even if the extra classes are not intended to be instantiated directly by users of the module, putting then on the module root make their documentation more easily accessible to others.
However, a class inside a function is a completely different creature: It is "declared" and created each time the code containing the class body is run. This gives one the possibility of creating dynamic classes for various uses - in a very simple way. For example, each class created this way is in a different closure, and can have access to different instances of the variables on the containing function.
def call_count(func):
class Counter(object):
def __init__(self):
self.counter = 0
def __repr__(self):
return str(func)
def __call__(self, *args, **kw):
self.counter += 1
return func(*args, **kw)
return Counter()
And using it on the console:
>>> #call_count
... def noop(): pass
...
>>> noop()
>>> noop()
>>> noop.counter
2
>>> noop
<function noop at 0x7fc251b0b578>
So, a simple call_counter decorator could use a static "Counter" class, defined outside the function, and receiving func as a parameter to its constructor - but if you want to tweak other behaviors, like in this example, making repr(func) return the function representation, not the class representation, it is easier to be made this way.
.