How should I divide my Clojure system into mutable / immutable parts? - clojure

Which of the following makes sense when dividing up my Clojure application into immutable parts?
Separate into different name-spaces the mutable/immutable parts
Add prefixes to defns which have side effects
Use the Clojure "doc" to explain this
Mix and match as you wish
I need to know this as I have a Clojure application which talks to databases, application servers and a stateful web framework, so I want my application to be as easy to maintain / read as possible

Some techniques that have worked for me:
Divide your namespaces and files by module/purpose rather than anything else. This makes more logical sense and helps you keep your design and dependencies clean.
Use "!" to indicate functions that have side effects, e.g. "swap!". Usually you should avoid side effects as much as possible, so it's a bit of a design smell if you see this happening too often
Try to avoid any mutable state in your library / utility functions. Not only does this usually give you a better API design, it's also much easier to test....
Keep application-specific mutable state to a small number of top-level defines. It's possible for example to use just a single top-level ref to an immutable map to store all your mutable data
It's helpful to document with examples that you can cut and paste into the REPL so that you can test things quickly or customise to a more complex use case. Again this is much easier if everything is pure.

Here is what my approach would be:
Don't divide the namespaces according to mutability/immutability unless you are writing a collections library or something similar. Use namespaces to indicate the logical partitions of your code, like ui, core, util etc.
By default keep all functions pure and hence do not use any prefix by default. State should be generally stored in refs and atoms defined as defs. Use names that indicate the satefullness, like userNameStore.
Document everything, all functions and vars. Or at least the public ones.
Mix and match but do not do so on an ad-hoc basis. Clearly structure your code so that the mutable state is limited and is well focussed.

Related

Why are not clojure namespaces in MVCC?

There's a lot of discussion on how to best tender the namespaces. Stuart Sierra enlightens us all about this in his Lifecycle Composition and the work with clojure.tools.namespace.
Most of the complexity comes from the mutability of the namespaces; then why don't we put the namespaces in Clojures own MVCC? There must be a reason to why, but I cannot figure it out myself.
One practical reason is that namespaces are used to build the MVCC so it makes building the compiler harder, though not impossible, to use MVCC in building namespaces. The other reason lies in the way programmers modify them, While the contents of refs are typically modified by manipulating data as the program runs, the contents of vars in a namsepace are almost always modified during program development where the vast majority of the time the programmer wants the change to be seen system wide and immediately.
It's worth noting that there are alternatives to namespaces for storing your functions in cases where you need coordinated updates of multiple functions. It is reasonable, if you need attomic upgrades, to store your functions in a ref and upgrade your program using dosync. This way you can have MVCC semantics for functions that need them and keep the update-in-place semantics of namespaces everywhere that does not need coordinated upgrades.
Code loading, in its very nature, is an operation that can be completely arbitrary - loading a namespace can trigger all sort of side effects, so it isn't suitable to be put into transactions.
Anyway, I think some improvements could be made to the current loading mechanism: for example, within a call to ns, require, etc, all current vars could be listed, as well as all the vars that are being added because of the call; if the call to ns/require fails, all the vars in the latter list would be removed.
Note that such an approach would require to serialize calls to ns/require, as opposed to the current mechanism which is concurrent. I don't think there is a strong case for concurrent loading anyway.

Is it possible to use policy based design together with automated testing?

I am developing a numerical simulations library which is centred around a single collection of data operated on by different computational algorithms. The algorithms are complex, they have different states involving multiple parameters, and are interchangeable (under some semantic restrictions).
To avoid bloated interface of the collection and to enable different implementations etc, I'm thinking about using policy based design. This gives the collection a wide combination of choices between storage structures, algorithms, parameters, internal stuff.
If I imagine that I redesigned my generic / object oriented existing design usign policies, how can I choose the optimal algorithms and data structures? Conceptually I need to define the set of policies and a set of verification test cases and execute a parametric study.
This is easy when object oriented programming is used since I can determine all necessary types and their parameters during run-time using e.g. a string-based Abstract Factory with type names stored in the input file, that is then changed by an external script that executes the client application on a family of test cases.
How do I do that with policies, where a combination of N policies ends up in being N different client applications?
How is automated testing done together with policy based design in a professional way?
If you're representing algorithms as policies, you /should/ have a pretty uniform interface already thought up. You could imagine an "AlgorithmPolicy" processing some data from your data store and returning some representation of the results.
"If I imagine that I redesigned my generic / object oriented existing design usign policies, how can I choose the optimal algorithms and data structures?"
If your object oriented design currently makes use of the strategy pattern (see also: the Gang of Four book), your policies will simply replace every place that you've used a strategy. Choosing "optimal algorithms" for the different policies you design will simply be a matter of nailing the right conceptual structure / interface for those policies. (If you're going to use many different data stores, make sure that the interface for adding / removing / getting data from them is uniform, for example. Here, it can be helpful to think of three examples and find commonalities... then think of another exmaple and make sure it fits the schema. Iterate until things feel correct.)
You'll still have adequate type checking, it'll just feel a bit different (and you may run into some nasty compile errors occaisionally. ;)
Testing will simply be a matter of writing some unit tests for each of the configurations / policy combinations you'd like to cover. You probably should already be writing these tests anyways; the primary difference is that you'll want to try to hit the interfaces you designate rather than targetting specifics.
You can validate different storage methods based on validations of your algorithm policies. (So, if I have some algorithm that can be stored in different ways, I can run the algorithm on some test data for ecah storage mechanism and expect the same results.) Assuming that you've spec'd out the inteface correclty, you should only need to write a single test for each additional storage mechanism you add.
Again: It'd be nice to have more details about the structure of the program, what different parameters and such you'd need to pass in. (Is any of this code open source / going to be open sourced?)
From what you've said, in my mind, your complicated-policy process may have an interface like so:
FancyDataStore.Process()
For testing it, I'd write:
MockAlgorithmPolicy - A very simple algorithm that's trivial to validate.
MockInternalStuffPolicy - A very simple internal stuff policy that causes no integrations / reports nothing new.
MockStoragePolicy - A very simple storage policy that meets your interface for storage / doesn't cause many issues.
Write a test that validates the mocks put together...
For each StoragePolicy you create, write an automated test to validate it:
testSomeStoragePolicy{
// has a call to:
FancyDataStore.Process<MockAlgorithmPolicy, SomeStoragePolicy, MockInternalStuff>()
// validate...
}
That should prove that the SomeStoragePolicy works as expected.
Then, for your algorithms, you could write:
testSomeAlgorithmPolicy{
FancyDataStore.process<SomeAlgorithmPolicy, MockStoragePolicy, MockInternalStuff>();
///Validate.
}
etc.
This way, you write basically 1 test per each policy you end up writing (which seems feasible and not too ridiculous) Additionally, you can always add additional unit tests to cover other subtle integrations that may spin up over time.
If you're looking for good books on this subject, I'd suggest reading "Modern C++ Programming"; it provides a great primer on policy-driven design in C++.

Clojure module dependencies

I'm trying to create a modular application in clojure.
Lets suppose that we have a blog engine, which consists of two modules, for example - database module, and article module (something that stores articles for blog), all with some configuration parameters.
So - article module depends on storage, And having two instances of article module and database module (with different parameters) allows us to host two different blogs in two different databases.
I tried to implement this creating new namespaces for each initialized module on-the-fly, and defining functions in this namespaces with partially applied parameters. But this approach is some sort of hacking, i think.
What is right way to do this?
A 'module' is a noun, as in the 'Kingdom of Nouns' by Steve Yegge.
Stick to non side-effecting or pure functions of their parameters (verbs) as much as possible except at the topmost levels of your abstractions. You can organize those functions however you like. At the topmost levels you will have some application state, there are many approaches to manage that, but the one I use the most is to hide these top-level services under a clojure protocol, then implement it in a clojure record (which may hold references to database connections or some-such).
This approach maximizes flexibility and prevents you from writing yourself into a corner. It's analagous to java's dependency injection. Stuart Sierra did a good talk recently on these topics at Clojure/West 2013, but the video is not yet available.
Note the difference from your approach. You need to separate the management and resolution of objects from their lifecycles. Tying them to namespaces is quick for access, but it means any functions you write as clients that use that code are now accessing global state. With protocols, you can separate the implementation detail of global state from the interface of access.
If you need a motivating example of why this is useful, consider, how would you intercept all access to a service that's globally accessible? Well, you would push the full implementation down and make the entry point a wrapper function, instead of pushing the relevant details closer to the client code. What if you wanted some behavior for some clients of the code and not others? Now you're stuck. This is just anticipating making those inevitable trade-offs preemptively and making your life easier.

How does Clojure approach Separation of Concerns?

How does Clojure approach Separation of Concerns ? Since code is data, functions can be passed as parameters and used as returns...
And, since there is that principle "Better 1000 functions that work on 1 data structure, than 100 functions on 100 data structures" (or something like that).
I mean, pack everything a map, give it a keyword as key, and that's it ? functions, scalars, collections, everything...
The idea of Separation of Concerns is implemented, in Java, by means of Aspects (aspect oriented programming) and annotations. This is my view of the concept and might be somewhat limited, so don't take it for granted.
What is the right way (idiomatic way) to go about in Clojure, to avoid the WTFs of fellow programmers _
In a functional language, the best way to handle separation of concerns is to convert any programming problem into a set of transformations on a data structure. For instance, if you write a web app, the overall goal is to take a request and transform it into a response, which can be thought of as simply transforming the request data into response data. (In a non-trivial web app, the starting data would probably include not only the request, but also session and database information) Most programming tasks can be thought of in this way.
Each "concern" would be a function in a "pipeline" that helps make the transform possible. In this way, each function is completely decoupled from the other steps.
Note that this means that your data, as it undergoes these transformations, needs to be rich in its structure. Essentially, we want to put all the "intelligence" of our program into the data, not in the code. In a complicated functional program, the data at the different levels may be complex enough that in needs to look like a programming language in its own right- This is where the idea of "domain-specific languages" comes into play.
Clojure has excellent support for manipulating complex heterogenous data structures, which makes this less cumbersome than it may sound (i.e. it's not cumbersome at all if done right)
In addition, Clojure's support for lazy data structures allows these intermediate data structures to actually be (conceptually) infinite in size, which makes this decoupling possible in most scenarios. See the following paper for info on why having infinite data structures is so valuable in this situation: http://www.cs.kent.ac.uk/people/staff/dat/miranda/whyfp90.pdf
This "pipeline" approach can handle 90% of your needs for separating concerns. For the remaining 10% you can use Clojure macros, which, at a high level, can be thought of as a very powerful tool for aspect-oriented programming.
That's how I believe you can best decouple concerns in Clojure- Note that "objects" or "aspects" are not really necessary concepts in this approach.

What are the best resources for learning how to avoid side effects and state in OOP?

I've been playing with functional programming lately and there are pretty good treatments on the topic of side effects, why they should be contained, etc. In projects where OOP is used, I'm looking for some resources which lay out some strategies for minimizing side effect and/or state.
A good example of this is the book RESTful Web Services which gives you strategies for minimizing state in a web application. What others exist?
Remember I'm not looking for another OOP analysts/design patterns book (though good encapsulation and loose coupling help avoid side effects) but rather a resource where the topic itself is state/side effects.
Some compiled answers
OOP programmers who mostly care about state do so because of concurrency, so read Java Concurrency in Practice. [exactly what I was looking for]
Use TDD to make side effects more visible [I like it, example: the bigger your setUps are, the more state you need in place to run your tests = good warning]
Command-query separation [Good stuff, prevents the side effect of changing a function argument which is generally confusing]
Methods do only one thing, perhaps use descriptive names if they change the state of their object so it's simple and clear.
Make objects immutable [I really like this]
Pass values as parameters, instead of storing them in member variables. [I don't link this; it clutters up function prototype and is actively discouraged by Clean Code and other books, though I do admit it helps the state issue]
Recompute values instead of storing and updating them [I also really like this; in the apps I work on performance is a minor concern]
Similarly, don't copy state around if you can avoid it. Make one object responsible for keeping it and let others access it there. [Basic OOP principle, good advice]
I don't think you'll find a lot current material in the OO world on this topic, simply because OOP (and most imperative programming, for that matter) relies on state and side effects. Consider logging, for instance. It's pure side-effect, yet in any self-respecting J2EE app, it's everywhere. Hoare's original QuickSort relies on mutable state, since you have to swap values around a pivot, and yet it too is everywhere.
This is why many OO programmers have trouble wrapping their heads around functional programming paradigms. They try to reassign the value of "x," discover that it can't be done (at least not in the way it can in every other language they've worked in), and they throw up their hands and shout "This is impossible!" Eventually, if they're patient, they learn recursion and currying and how the map function replaces the need for loops, and they calm down. But the learning curve can be very steep for some.
The OO programmers these days who care most about avoiding state are those working on concurrency. The reasons for this are obvious -- mutable state and side effects cause huge headaches when you're trying to manage concurrency between threads. As a result, the best discussion I've seen in the OO world about avoiding state is Java Concurrency in Practice.
I think the rules are quite simple: methods should only ever do one thing, and the intent should be communicated clearly in the method name.
Methods should either query or change data, but never both.
Some small things I do:
Prefer immutable state, it is relatively benign. E.g. in Java I make member variables final and set them in the constructor wherever possible.
Pass around values as parameters, instead of storing them in member variables.
Recompute values instead of storing and updating them, if that can be done cheaply enough. This helps to avoid inconsistent data by forgetting to update it.
Similarly, don't copy state around if you can avoid it. Make one object responsible for keeping it and let others access it there.
One way to isolate side-effects in OO is to let operations only return a description object of the side-effects to cause.
Command-query separation is a pattern that is close to this idea.
By practising TDD (or at least writing unit tests) one will typically be much more aware of side-effects and use them more sparingly, and also separate them from other side-effect free expressions that are easy to write data driven (expected, actual) unit-tests for.