Clojure module dependencies - clojure

I'm trying to create a modular application in clojure.
Lets suppose that we have a blog engine, which consists of two modules, for example - database module, and article module (something that stores articles for blog), all with some configuration parameters.
So - article module depends on storage, And having two instances of article module and database module (with different parameters) allows us to host two different blogs in two different databases.
I tried to implement this creating new namespaces for each initialized module on-the-fly, and defining functions in this namespaces with partially applied parameters. But this approach is some sort of hacking, i think.
What is right way to do this?

A 'module' is a noun, as in the 'Kingdom of Nouns' by Steve Yegge.
Stick to non side-effecting or pure functions of their parameters (verbs) as much as possible except at the topmost levels of your abstractions. You can organize those functions however you like. At the topmost levels you will have some application state, there are many approaches to manage that, but the one I use the most is to hide these top-level services under a clojure protocol, then implement it in a clojure record (which may hold references to database connections or some-such).
This approach maximizes flexibility and prevents you from writing yourself into a corner. It's analagous to java's dependency injection. Stuart Sierra did a good talk recently on these topics at Clojure/West 2013, but the video is not yet available.
Note the difference from your approach. You need to separate the management and resolution of objects from their lifecycles. Tying them to namespaces is quick for access, but it means any functions you write as clients that use that code are now accessing global state. With protocols, you can separate the implementation detail of global state from the interface of access.
If you need a motivating example of why this is useful, consider, how would you intercept all access to a service that's globally accessible? Well, you would push the full implementation down and make the entry point a wrapper function, instead of pushing the relevant details closer to the client code. What if you wanted some behavior for some clients of the code and not others? Now you're stuck. This is just anticipating making those inevitable trade-offs preemptively and making your life easier.

Related

Where to put database access/functionality in clojure application?

I'm writing a small Clojure application which has a lot of interaction with a MongoDB database with 2-3 different collections.
I come from a OOP/Ruby/ActiveRecord background where standard practice is to create one class per data model and give each one access to the database. I've started doing the same thing in my clojure project. I have one namespace per "data model" and each has its own database connection and CRUD functions. However, this doesn't feel very functional or clojure-like, and I was wondering if there is a more idiomatic way of doing it, such as having a data or database namespace with functions like get-post, and limiting access to the database to only that namespace.
This seems like it would have the benefit of isolating the database client dependency to just one namespace, and also of separating pure functions from those with side effects.
On the other hand, I would have one more namespace which I would need to reference from many different parts of my application, and having a namespace called "data" just seems odd to me.
Is there a conventional, idiomatic way of doing this in Clojure?
A nice and, arguably, the most idiomatic (scored 'adopt' on the Clojure radar) way to manage state in a Clojure app is that proposed by Stuart Sierra's great Component library. In a nutshell, the philosophy of Component is to store all the stateful resources in a single system map that explicitly defines their mutual relationship, and then to architect your code in such a way that your functions are merely passing the state to each other.
Connection / environment access
One part of your system will be to manage the 'machinery' of your application: start the web server, connect do data stores, retrieve configuration, etc. Put this part in a namespace separate namespace from your business logic (your business logic namespaces should not know about this namespace!). As #superkondukr said, Component is a battle-tested and well-documented way to do this.
The recommended way to communicate the database connection (and other environmental dependencies for that matter) to your business logic is via function arguments, not global Vars. This will make everything more testable, REPL-friendly, and explicit as to who depends on whom.
So your business logic functions will receive the connection as an argument and pass it along to other functions. But where does the connection come from in the first place? The way I do it is to attach it to events/requests when they enter the system. For instance, when you start your HTTP server, you attach the connection to each HTTP request coming in.
Namespace organization:
In an OO language, the conventional support for data is instances of classes representing database entities; in order to provide an idiomatic OO interface, business logic is then defined as methods of these classes. As Eric Normand put it in a recent newsletter, you define your model's 'names' as classes, and 'verbs' as methods.
Because Clojure puts emphasis on plain data structures for conveying information, you don't really have these incentives. You can still organize your namespaces by entity to mimick this, but I actually don't think it's optimal. You should also account for the fact that Clojure namespaces, unlike classes in most OO languages, don't allow for circular references.
My strategy is: organize your namespaces by use case.
For example, imagine your domain model has Users and Posts. You may have a myapp.user namespace for Users CRUD and core business logic; similarly you may have a myapp.post namespace. Maybe in your app the Users can like Posts, in which case you'll manage this in a myapp.like namespace which requires both myapp.user and myapp.posts. Maybe your Users can be friends in your app, which you'll manage in a myapp.friendship namespace. Maybe you have a small backoffice app with data visualization about all this: you may put this in a myapp.aggregations namespace for example.

Better managing Coldfusion Component (CFC) functions

I heavily rely on CFC. Sometimes within an application, I would have multiple CFC containing dozens of functions per CFC. So over time, it's easy to forget or miss out on already created functions.
So my question is how do you guys manage all these functions? Do you keep a separate document listing all the functions and indexing them that way? Is there an automated feature built in that we can use?
What I've been doing is naming functions more meaningfully but it's very tedious. There has to be a better way to do this. Just looking for your thoughts.
Thank you in advance.
I don't think there's a magic bullet here. Programmers with a bit more OCD than I will likely respond and give you an iron clad solution. For me (or my team) I keep a library of common components in a folder that I reuse for various sites and applications. Then I add them as a /util or /lib folder for a given project and use them (or extend them) as needed. Good planning - good documentation (a Wiki is a great choice for a team) is a must.
Planning carefully whether to extend a CFC is especially important. Otherwise you have to chase down nested function that are part of some super class way down in the weeds (as in, this works, but I really have no idea why it works).
This is where frameworks can provide much needed structure. For common functions and events they generally provide a location and a convention for creating such things. That makes them easy to decipher (as long as you've been indoctrinated into the framework). They have some downsides but they make life a lot easier :)
-You should follow the proper naming conventions for each and every cfcs.
-Each cfcs should be meant for a particular purpose. i.e. login cfcs should should only contain the login related functions.
-All common functions should be kept together in a cfc and that can be extended by the other cfcs.
-You can use a generic cfc for random functions.
Now, if you want to add a new function for any functionality then you can scan only 3 cfcs i.e. dedicated to that functionality, common and random. And you add the new as per the best fit.

Is it possible to use policy based design together with automated testing?

I am developing a numerical simulations library which is centred around a single collection of data operated on by different computational algorithms. The algorithms are complex, they have different states involving multiple parameters, and are interchangeable (under some semantic restrictions).
To avoid bloated interface of the collection and to enable different implementations etc, I'm thinking about using policy based design. This gives the collection a wide combination of choices between storage structures, algorithms, parameters, internal stuff.
If I imagine that I redesigned my generic / object oriented existing design usign policies, how can I choose the optimal algorithms and data structures? Conceptually I need to define the set of policies and a set of verification test cases and execute a parametric study.
This is easy when object oriented programming is used since I can determine all necessary types and their parameters during run-time using e.g. a string-based Abstract Factory with type names stored in the input file, that is then changed by an external script that executes the client application on a family of test cases.
How do I do that with policies, where a combination of N policies ends up in being N different client applications?
How is automated testing done together with policy based design in a professional way?
If you're representing algorithms as policies, you /should/ have a pretty uniform interface already thought up. You could imagine an "AlgorithmPolicy" processing some data from your data store and returning some representation of the results.
"If I imagine that I redesigned my generic / object oriented existing design usign policies, how can I choose the optimal algorithms and data structures?"
If your object oriented design currently makes use of the strategy pattern (see also: the Gang of Four book), your policies will simply replace every place that you've used a strategy. Choosing "optimal algorithms" for the different policies you design will simply be a matter of nailing the right conceptual structure / interface for those policies. (If you're going to use many different data stores, make sure that the interface for adding / removing / getting data from them is uniform, for example. Here, it can be helpful to think of three examples and find commonalities... then think of another exmaple and make sure it fits the schema. Iterate until things feel correct.)
You'll still have adequate type checking, it'll just feel a bit different (and you may run into some nasty compile errors occaisionally. ;)
Testing will simply be a matter of writing some unit tests for each of the configurations / policy combinations you'd like to cover. You probably should already be writing these tests anyways; the primary difference is that you'll want to try to hit the interfaces you designate rather than targetting specifics.
You can validate different storage methods based on validations of your algorithm policies. (So, if I have some algorithm that can be stored in different ways, I can run the algorithm on some test data for ecah storage mechanism and expect the same results.) Assuming that you've spec'd out the inteface correclty, you should only need to write a single test for each additional storage mechanism you add.
Again: It'd be nice to have more details about the structure of the program, what different parameters and such you'd need to pass in. (Is any of this code open source / going to be open sourced?)
From what you've said, in my mind, your complicated-policy process may have an interface like so:
FancyDataStore.Process()
For testing it, I'd write:
MockAlgorithmPolicy - A very simple algorithm that's trivial to validate.
MockInternalStuffPolicy - A very simple internal stuff policy that causes no integrations / reports nothing new.
MockStoragePolicy - A very simple storage policy that meets your interface for storage / doesn't cause many issues.
Write a test that validates the mocks put together...
For each StoragePolicy you create, write an automated test to validate it:
testSomeStoragePolicy{
// has a call to:
FancyDataStore.Process<MockAlgorithmPolicy, SomeStoragePolicy, MockInternalStuff>()
// validate...
}
That should prove that the SomeStoragePolicy works as expected.
Then, for your algorithms, you could write:
testSomeAlgorithmPolicy{
FancyDataStore.process<SomeAlgorithmPolicy, MockStoragePolicy, MockInternalStuff>();
///Validate.
}
etc.
This way, you write basically 1 test per each policy you end up writing (which seems feasible and not too ridiculous) Additionally, you can always add additional unit tests to cover other subtle integrations that may spin up over time.
If you're looking for good books on this subject, I'd suggest reading "Modern C++ Programming"; it provides a great primer on policy-driven design in C++.

services based architecture should not necessarily imply distribution?

In my workplace (and a lot of other areas), there is a lot of emphasis on building architecture around services. (I am working in an e-commerce startup). However, I think services are implicitly considered as distributed. I am a believer of the first law of distribution - "don't distribute". So, I believe that we should not un-necessarily complicate architecture. It should be an architecture which can evolve. So, one of the ways to approach the problem would be to create well defined namespaces and build code around it, but keep the communication via java api. (this keeps monitoring requirement low, and reliability/availability problems low). This can easily be evolved into a distributed architecture by wrapping modules into web service, as and when, the scale requirements kick-in. So, the question is - what are the cons of writing code as a single application and evolving into distributed services, rather than straight jumping into implementing web services based architecture? Am I right in assuming that services should imply the basic principles of design (abstraction, encapsulation etc), rather than distribution over network?
Distribution requires modularity. However, it requires more than just modularity: it also requires coarse-grained interaction between the modules.
For example, in a single-process ecommerce system, you might have separate modules for managing the user's shopping cart and calculating prices. They might interact by the cart asking the calculator to price an item, then another item, etc. That would be perfectly fine.
However, in a distributed system, that would require a torrent of small method calls, which is inefficient; you might get away with it if you used CORBA for distribution, but with SOAP, you'd be in trouble. Rather, you would want to have the cart ask the calculator to price the whole order in one go. That might be worse from a separation of concerns point of view (why should the calculator have to know about the idea of carts?), but it would be required to make the system perform adequately.
Related to granularity, there's also the problem of modules interacting via interfaces or implementations. With a single process, you can define a set of interfaces through which modules will interact; modules can pass each other objects implementing those interfaces without having to tell each other about the implementations (eg a scheduler module could be passed anything implementing interface Job { void run(); }). Across a network, the requirement for coarse grain means that any objects passed must be passed by value (because passing by reference would entail fine-grained calls back to the passing module - unless you were using mobile code, which you aren't, because nobody is), which means that both modules must know about and agree on the implementations of the objects.
So, while building a single-process system in a modular way makes it easier to implement SOA later, it doesn't make it as simple as wrapping each module in a SOAP interface. At least, not unless you build your system in a coarse-grained manner from the start, which means throwing away a number of sound and helpful good software engineering practices.

What are the beneifts of using a database abstraction layer?

I've been using some code that implements the phpBB DBAL for some time. Recently I had to implement a more full package around it and decided to use the DBAL throughout. In the main, it's been OK. But occassionally there are circumstances where I can't see the logic in using it. It seems to make the simple much more complicated.
What benefits does a DBAL offer rather then writing sql statements directly?
From wikipedia (http://en.wikipedia.org/wiki/Database_abstraction_layer) :
API level abstraction
Libraries like OpenDBX unify access to databases by providing a single low-level programming interface to the application developer. Their advantages are most often speed and flexibility because they are not tied to a specific query language (subset) and only have to implement a thin layer to reach their goal. The application developer can choose from all language features but has to provide configurable statements for querying or changing tables. Otherwise his application would also be tied to one database.
When cooking a dish, you do not want several chefs having access to the pot. They could all be adding spices unaware that another chef had already added a spice. Ideally, you want a single chef that would serve as a single point of access to avoid spoiling the soup.
The same with databases. A single point of access can avoid problems of multiple services accessing the data in different ways.