How would you idiomatically extend arithmetric functions for other datatypes in Clojure? - clojure

So I want to use java.awt.Color for something, and I'd like to be able to write code like this:
(use 'java.awt.Color)
(= Color/BLUE (- Color/WHITE Color/RED Color/GREEN))
Looking at the core implementation of -, it talks specifically about clojure.lang.Numbers, which to me implies that there is nothing I do to 'hook' into the core implementation and extend it.
Looking around on the Internet, there seems to be two different things people do:
Write their own defn - function, which only knows about the data type they're interested in. To use you'd probably end up prefixing a namespace, so something like:
(= Color/BLUE (scdf.color/- Color/WHITE Color/RED Color/GREEN))
Or alternatively useing the namespace and use clojure.core/- when you want number math.
Code a special case into your - implementation that passes through to clojure.core/- when your implementation is passed a Number.
Unfortunately, I don't like either of these. The first is probably the cleanest, as the second makes the presumption that the only things you care about doing maths on is their new datatype and numbers.
I'm new to Clojure, but shouldn't we be able to use Protocols or Multimethods here, so that when people create / use custom types they can 'extend' these functions so they work seemlessly? Is there a reason that +,- etc doesn't support this? (or do they? They don't seem to from my reading of the code, but maybe I'm reading it wrong).
If I want to write my own extensions to common existing functions such as + for other datatypes, how should I do it so it plays nicely with existing functions and potentially other datatypes?

It wasn't exactly designed for this, but core.matrix might be of interest to you here, for a few reasons:
The source code provides examples of how to use protocols to define operations that work with with various different types. For example, (+ [1 2] [3 4]) => [4 6]). It's worth studying how this is done: basically the operators are regular functions that call a protocol, and each data type provides an implementation of the protocol via extend-protocol
You might be interested in making java.awt.Color work as a core.matrix implementation (i.e. as a 4D RGBA vector). I did something simiilar with BufferedImage here: https://github.com/clojure-numerics/image-matrix. If you implement the basic core.matrix protocols, then you will get the whole core.matrix API to work with Color objects. Which will save you a lot of work implementing different operations.

The probable reason for not making arithmetic operation in core based on protocols (and making them only work of numbers) is performance. A protocol implementation require an additional lookup for choosing the correct implementation of the desired function. Although from design point of view it may feel nice to have protocol based implementations and extend them whenever required, but when you have a tight loop that does these operations many times (and this is very common use case with arithmetic operations) you will start feeling the performance issues because of the additional lookup on each operation that happen at runtime.
If you have separate implementation for your own data types (ex: color/-) in their own namespace then it will be more performant due to a direct call to that function and it also make things more explicit and customizable for specific cases.
Another issue with these functions will be their variadic nature (i.e they can take any number of arguments). This is a serious issue in providing a protocol implementation as protocol extended type check only works on first parameter.

You can have a look at algo.generic.arithmetic in algo.generic. It uses multimethods.

Related

Why would I ever want to use Maybe instead of a List?

Seeing as the Maybe type is isomorphic to the set of null and singleton lists, why would anyone ever want to use the Maybe type when I could just use lists to accomodate absence?
Because if you match a list against the patterns [] and [x] that's not an exhaustive match and you'll get a warning about that, forcing you to either add another case that'll never get called or to ignore the warning.
Matching a Maybe against Nothing and Just x however is exhaustive. So you'll only get a warning if you fail to match one of those cases.
If you choose your types such that they can only represent values that you may actually produce, you can rely on non-exhaustiveness warnings to tell you about bugs in your code where you forget to check for a given a case. If you choose more "permissive" types, you'll always have to think about whether a warning represents an actual bug or just an impossible case.
You should strive to have accurate types. Maybe expresses that there is exactly one value or that there is none. Many imperative languages represent the "none" case by the value null.
If you chose a list instead of Maybe, all your functions would be faced with the possibility that they get a list with more than one member. Probably many of them would only be defined for one value, and would have to fail on a pattern match. By using Maybe, you avoid a class of runtime errors entirely.
Building on existing (and correct) answers, I'll mention a typeclass based answer.
Different types convey different intentions - returning a Maybe a represents a computation with the possiblity of failing while [a] could represent non-determinism (or, in simpler terms, multiple possible return values).
This plays into the fact that different types have different instances for typeclasses - and these instances cater to the underlying essence the type conveys. Take Alternative and its operator (<|>) which represents what it means to combine (or choose) between arguments given.
Maybe a Combining computations that can fail just means taking the first that is not Nothing
[a] Combining two computations that each had multiple return values just means concatenating together all possible values.
Then, depending on which types your functions use, (<|>) would behave differently. Of course, you could argue that you don't need (<|>) or anything like that, but then you are missing out on one of Haskell's main strengths: it's many high-level combinator libraries.
As a general rule, we like our types to be as snug fitting and intuitive as possible. That way, we are not fighting the standard libraries and our code is more readable.
Lisp, Scheme, Python, Ruby, JavaScript, etc., manage to get along with just one type each, which you could represent in Haskell with a big sum type. Every function handling a JavaScript (or whatever) value must be prepared to receive a number, a string, a function, a piece of the document object model, etc., and throw an exception if it gets something unexpected. People who program in typed languages like Haskell prefer to limit the number of unexpected things that can occur. They also like to express ideas using types, making types useful (and machine-checked) documentation. The closer the types come to representing the intended meaning, the more useful they are.
Because there are an infinite number of possible lists, and a finite number of possible values for the Maybe type. It perfectly represents one thing or the absence of something without any other possibility.
Several answers have mentioned exhaustiveness as a factor here. I think it is a factor, but not the biggest one, because there is a way to consistently treat lists as if they were Maybes, which the listToMaybe function illustrates:
listToMaybe :: [a] -> Maybe a
listToMaybe [] = Nothing
listToMaybe (a:_) = Just a
That's an exhaustive pattern match, which rules out any straightforward errors.
The factor I'd highlight as bigger is that by using the type that more precisely models the behavior of your code, you eliminate potential behaviors that would be possible if you used a more general alternative. Say for example you have some context in your code where you uses a type of the form a -> [b], though the only correct alternatives (given your program's specification) are empty or singleton lists. Try as hard as you may to enforce the convention that this context should obey that rule, it's still possible that you'll mess up and:
Somehow a function used in that context will produce a list of two or more items;
And somehow a function that uses the results produced in that context will observe whether the lists have two or more items, and behave incorrectly in that case.
Example: some code that expects there to be no more than one value will blindly print the contents of the list and thus print multiple items when only one was supposed to be.
But if you use Maybe, then there really must be either one value or none, and the compiler enforces this.
Even though isomorphic, e.g. QuickCheck will run slower because of the increase in search space.

In Clojure (core.async) what's the difference between alts and alt?

I can't figure out the difference between:
alts!
and
alt!
in Clojure's core.async.
alts! is a function that accepts a vector of channels to take from and/or channels with values to be put on them (in the form of doubleton vectors: [c v]). The vector may be dynamically constructed; the code calling alts! may not know how many channels it'll be choosing among (and indeed that number need not be constant across invocations).
alt! is a convenience macro which basically acts as a cross between cond and alts!. Here the number of "ports" (channels or channel+value pairs) must be known statically, but in practice this is quite often the case and the cond-like syntax is very clear.
alt! expands to a somewhat elaborate expression using alts!; apart from the syntactic convenience, it offers no extra functionality.

Avoid name clashes in a Clojure DSL

As a side project I'm creating a Clojure DSL for image synthesis (clisk).
I'm a little unsure on the best approach to function naming where I have functions in the DSL that are analogous to functions in Clojure core, for example the + function or something similar is needed in my DSL to additively compose images / perform vector maths operations.
As far as I can see it there are a few options:
Use the same name (+) in my own namespace. Looks nice in DSL code but will override the clojure.core version, which may cause issues. People could get confused.
Use the same name but require it to be qualified (my-ns/+). Avoids conflicts, but prevents people from useing the namespace for convenience and looks a bit ugly.
Use a different short name e.g. (v+). Can be used easily and avoid clashes, but the name is a bit ugly and might prove hard to remember.
Use a different long name e.g. (vector-add). Verbose but descriptive, no clashes.
Exclude clojure.core/+ and redefine with a multimethod + (as georgek suggests).
Example code might look something like:
(show (v+ [0.9 0.6 0.3]
(dot [0.2 0.2 0]
(vgradient (vseamless 1.0 plasma) ))))
What is the best/most idiomatic approach?
first, the repeated appearance of operators in an infix expression requires a nice syntax, but for a lisp, with prefix syntax, i don't think this is as important. so it's not such a crime to have the user type a few more characters for an explicit namespace. and clojure has very good support for namespaces and aliasing. so it's very easy for a user to select their own short prefix: (x/+ ...) for example.
second, looking at the reader docs there are not many non-alphanumeric symbols to play with, so something like :+ is out. so there's no "cute" solution - if you choose a prefix it's going to have to be a letter. that means something like x+ - better to let the user choose an alias, at the price of one more character, and have x/+.
so i would say: ignore core, but expect the user to (:require .... :as ...). if they love your package so much they want it to be default then they can (:use ...) and handle core explicitly. but you choosing a prefix to operators seems like a poor compromise.
(and i don't think i have seen any library that does use single letter prefixes).
one other possibility is to provide the above and also a separate package with long names instead of operators (which are simply def'ed to match the values in the original package). then if people do want to (:use ...) but want to avoid clashes, they can use that (but really what's the advantage of (vector-add ...) over (vector/+ ...)?)
and finally i would check how + is implemented, since if it already involves some kind of dispatch on types then georgek's comment makes a lot of sense.
(by "operator" above i just mean single-character, non-alphanumeric symbol)

Data format safety in clojure

Coming from a Java background, I'm quite fond of static type safety and wonder how clojure programmers deal with the problem of data format definitions (perhaps not just types but general invariants, because types are just a special case of that.)
This is similar to an existing question "Type Safety in Clojure", but that one focuses more on the aspect of how to check types at compile time, while I'm more interested in how the problem is pragmatically addressed.
As a practical example I'm considering an editor application which handles a particular document format. Each document consists of elements that come in several different varieties (graphics elements, font elements etc.) There would be editors for the different element types, and also of course functions to transform a document from/to a byte stream in its native on-disk format.
The basic problem I am interested in is that the editors and the read/write functions have to agree on a common data format. In Java, I would model the document's data as an object graph, e.g. with one class representing a document and one class for each element variety. This way, I get a compile-time guarantee about what the structure of my data looks like, and that the field "width" of a graphics element is an integer and not a float. It does not guarantee that width is positive - but using a getter/setter interface would allow the corresponding class to add invariant guarantees like that.
Being able to rely on this makes the code dealing with this data simpler, and format violations can be caught at compile-time or early at runtime (where some code attempts to modify data that would violate invariants).
How can you achieve a similar "data format reliability" in Clojure? As far as I know, there is no way to perform compile-time checking and hiding domain data behind a function interface seems to be discouraged as non-idiomatic (or maybe I misunderstand?), so what do Clojure developers do to feel safe about the format of data handed into their functions? How do you get your code to error out as quickly as possible, and not after the user edited for 20 more minutes and tries to save to disk, when the save function notices that there is a graphics element in the list of fonts due to an editor bug?
Please note that I'm interested in Clojure and learning, but didn't write any actual software with it yet, so it's possible that I'm just confused and the answer is very simple - if so, sorry for wasting your time :).
I don't see anything wrong or unidiomatic about using a validating API to construct and manipulate your data as in the following.
(defn text-box [text height width]
{:pre [(string? text) (integer? height) (integer? width)]}
{:type 'text-box :text text :height height :width width})
(defn colorize [thing color]
{:pre [(valid-color? color)]}
(assoc thing :color color))
... (colorize (text-box "Hi!" 20 30) :green) ...
In addition, references (vars, refs, atoms, agents) can have an associated validator function that can be used to ensure a valid state at all times.
Good question - I also find that moving from a statically typed language to a dynamic one requires a bit more care about type safety. Fortunately TDD techniques help a huge amount here.
I typically write a "validate" function which checks all your assumptions about the data structure. I often do this in Java too for invariant assumptions, but in Clojure it's more important because you need to check thinks like types as well.
You can then use the validate function in several ways:
As a quick check at the REPL: (validate foo)
In unit tests: (is (validate (new-foo-from-template a b c)))
As a run-time check for key functions, e.g. checking that (read-foo some-foo-input-stream) is valid
If you have a complex data structure which is a tree of multiple different component types, you can write a validate function for each component type and have the validate function for the whole document call validate for each sub-component recursively. A nice trick is to use either protocols or multimethods to make the validate function polymorphic for each component type.

calculating user defined formulas (with c++)

We would like to have user defined formulas in our c++ program.
e.g. The value v = x + ( y - (z - 2)) / 2. Later in the program the user would define x,y and z -> the program should return the result of the calculation. Somewhen later the formula may get changed, so the next time the program should parse the formula and add the new values. Any ideas / hints how to do something like this ? So far I just came to the solution to write a parser to calculate these formulas - maybe any ideas about that ?
If it will be used frequently and if it will be extended in the future, I would almost recommend adding either Python or Lua into your code. Lua is a very lightweight scripting language which you can hook into and provide new functions, operators etc. If you want to do more robust and complicated things, use Python instead.
You can represent your formula as a tree of operations and sub-expressions. You may want to define types or constants for Operation types and Variables.
You can then easily enough write a method that recurses through the tree, applying the appropriate operations to whatever values you pass in.
Building your own parser for this should be a straight-forward operation:
) convert the equation from infix to postfix notation (a typical compsci assignment) (I'd use a stack)
) wait to get the values you want
) pop the stack of infix items, dropping the value for the variable in where needed
) display results
Using Spirit (for example) to parse (and the 'semantic actions' it provides to construct an expression tree that you can then manipulate, e.g., evaluate) seems like quite a simple solution. You can find a grammar for arithmetic expressions there for example, if needed... (it's quite simple to come up with your own).
Note: Spirit is very simple to learn, and quite adapted for such tasks.
There's generally two ways of doing it, with three possible implementations:
as you've touched on yourself, a library to evaluate formulas
compiling the formula into code
The second option here is usually done either by compiling something that can be loaded in as a kind of plugin, or it can be compiled into a separate program that is then invoked and produces the necessary output.
For C++ I would guess that a library for evaluation would probably exist somewhere so that's where I would start.
If you want to write your own, search for "formal automata" and/or "finite state machine grammar"
In general what you will do is parse the string, pushing characters on a stack as you go. Then start popping the characters off and perform tasks based on what is popped. It's easier to code if you force equations to reverse-polish notation.
To make your life easier, I think getting this kind of input is best done through a GUI where users are restricted in what they can type in.
If you plan on doing it from the command line (that is the impression I get from your post), then you should probably define a strict set of allowable inputs (e.g. only single letter variables, no whitespace, and only certain mathematical symbols: ()+-*/ etc.).
Then, you will need to:
Read in the input char array
Parse it in order to build up a list of variables and actions
Carry out those actions - in BOMDAS order
With ANTLR you can create a parser/compiler that will interpret the user input, then execute the calculations using the Visitor pattern. A good example is here, but it is in C#. You should be able to adapt it quickly to your needs and remain using C++ as your development platform.