So, I'm diving deeper and deeper into Clojure.Spec.
One thing I stumbled upon is, where to put my specs. I see three options:
Global Spec File
In most examples, I found online, there is one big spec.clj file, that gets required in the main namespace. It has all the (s/def) and (s/fdef) for all the "data types" and functions.
Pro:
One file to rule them all
Contra:
This file can be big
Single Responsibliy Principle violated?
Specs in production namespaces
You could put your (s/def) and (s/fdef) right next to your production code. So that, the implementation and the spec co-exist in the same namespace.
Pro:
co-location of implementation and spec
one namespace - one concern?
Contra:
production code could get messy
one namespace - two concerns?
Dedicated spec namespace structure
Then I thought, maybe Specs are a third kind of code (next to production and test). So maybe they deserve their own structure of namespaces, like this:
├─ src
│ └─ package
│ ├─ a.clj
│ └─ b.clj
├─ test
│ └─ package
│ ├─ a_test.clj
│ └─ b_test.clj
└─ spec
└─ package
├─ a_spec.clj
└─ b_spec.clj
Pro:
dedicated (but related) namespaces for specs
Contra:
you have to source and require the correct namespaces
Who has experience with one of the approaches?
Is there another option?
What do you think about the different options?
I usually put specs in their own namespace, alongside the namespace that they are describing. It doesn't particularly matter what they're named, as long as they use some consistent naming convention. For example, if my code is in my.app.foo, I'll put specs in my.app.foo.specs.
It is preferable for spec key names to be in the namespace of the code, however, not the namespace of the spec. This is still easy to do by using a namespace alias on the keyword:
(ns my.app.foo.specs
(:require [my.app.foo :as f]))
(s/def ::f/name string?)
I'd stay away from trying to put everything in one giant spec namespace (what a nightmare.) While I certainly could put them right alongside the spec'd code in the same file, that hurts readability IMO.
You could put all your spec namespaces in a separate source path, but there's no real benefit to doing so unless you're in a situation where you want to distribute the code but not the specs or vice versa... hard to imagine what that'd be though.
In my opinion the specs should be in the same ns as the code.
My main consideration is my personal philosophy, and technically it works out well. My philosophy is that the spec of a function is an integral part of it. It isn't an extra thing. It's definitely not "two concerns" as put in the OP. Actually, it's what the function is, because in terms of correctness: who cares about the implementation? who cares what you wrote in your defn? who cares about the identifier? who cares about anything but the spec?
I think it's weird because not only clojure.spec came later, also most languages will not let you have specs as an integral thing even if you wanted it to be, and anything close (tests in the code perhaps) is usually frowned upon, so of course it seems weird. But give it some thought, and you might reach a conclusion like I did (or you may not, this part is opinionated).
The only good reasons I can think of as to why you wouldn't want specs in the same ns are these two reasons:
It clutters your code.
You want to support Clojure versions pre Clojure 1.9.0.
As for the first reason, well, again, I think it's an integral part of your functions. If you find that it really is too much, my advice would be the same as if your ns would be too cluttered up regardless of spec: see if you can split it up.
As for the second reason, if you truly care, you can check in code if the clojure.spec ns is available, if not then shadow the names with functions/macros that are NOP. Another option is to use clojure-future-spec but I haven't tried it myself so I don't know how well it works.
Another way this works out well technically is that sometimes there's a cyclic dependency that you cannot handle with different namespaces, e.g. when your specs depend on your code (for function specs) and your code depends on your specs for parsing (see this question).
One other consideration depending on your use case - putting the specs alongside your main code limits use of your code to Clojure 1.9 clients, which may or may not be what you want. Like #levand, I would recommend a parallel namespace for each of your code namespaces.
Related
I've been learning clojure for some weeks and recently I began reading some open source code: clojure and clojurescript compilers and some libraries like om, boot, figwheel.
I noticed some clojure files are very long, some of them more than a thousand LOC. Given that clojure's code is very terse and low ceremony, that code means much more code than a file that big in some other languages.
Coming from an OO background where you usually have one class per file and you try to keep your classes short (SRP) I found that a little weird.
I know that clojure code is mostly composed of pure functions and they're way easier to reason about than some mutable class where you need to keep the current state in mind, and I find that I can read and understand most of the functions one at a time. But most of those functions are very well designed so that they don't depend on each other: even though you can use (filter odd?) it doesn't mean that filter and odd? are related. But for "every day" code (LOB apps, web apps, etc) is very hard to keep the functions as self contained as those (at least that's my experience with OO programming).
I've also seen some demos of clojurescript applications (om, reagent, etc) where they declare all components in the same file. I don't know if that's because it's just a demo and in a real life application you'd have a product.clj and a category.clj or that's just the clojure way: to have one file per namespace/module/bounded-context.
I think that if I open a folder and I see product.clj, category.clj, order.clj, etc I can get the idea at a glance what's that folder about, better than just having a components.clj or core.clj.
So, my questions are:
Is it common for "every day" clojure code to have these very long files? or is it just because I'm reading libraries code and "normal" code is more "modular", I mean: more files and less length.
Does having long files like those actually make it harder to comprehend at a glance what's the application about? like my product/category/order example above, or by some clojuresque property that's not an issue.
In case long files are the "clojure way", how do you handle conflicts, refactorings, programming in a team... if everybody is touching the same file?
1: I looked at the reasonably large non-library clojure project i'm working on right now and ran this:
ls **/*.clj | xargs wc -l | awk '{print $1}' | head -n -1 > counts
and opend a repl and ran
user> (float (/ (reduce + counts) (count counts)))
208.76471
I see that on a project with 17k LOC our average clojure file has 200 lines in it. I found one with 1k LOC.
2: Yes, I'll get started breaking that long one down as soon as I have free time. some very long ones such as clojure.core are very long becaues of clojure's one pass design and the need to self-bootstrap. they need to build the ability to have many namespaces for instance before they can do so. For other fancy libraries it may very well be that they had some other design reason for a large file though usually it's a case of "pull request welcome" in my expierence.
3: I do work in a large team with a few large files, we handle merge conflicts with git, though because changes tend to be within a function these come up, for me, much less often than in other languages. I find that it's simply not a problem.
They tend to get long as you develop them. Say you need a function foo to do procedures [a b ...] on datastructure K. You first (def) the signature of the function and continue to implement helper functions a b ... since they're likely all pure functions and the functionality you need of foo is complex the namespace tends to get long.
Sometimes, but the repl is a really useful tool, to understand a new library's main functionality I often use clojure.repl/source on the function and work my way backwards on it's helper functions. I find that a lot of time Clojure libraries documentation is either cryptic or non existent, but as many in the community like to say Clojure's functions' source is self documenting.
I have no experience working in a large team, but Arthur Ulfeldt is right most changes happen in a single function, I gather it from reading the diffs of pull requests with Github's Blame feature.
It is pragmatic (clojure or not) to avoid dependencies. Naming and classifying abstract things makes our intellect feel good, but it sort of gives up when having to stitch back all the parts together. Why make three files when one will do?
Making your mind up on what an app / lib is all about, just by reading the code? There's the "what" and there's the "how". Better have a clue about the former if you want to dive into the latter. If you're on reading the code to get a clue about the purpose of an app, I'm not sure having it split amongst more files will make it easier. Think twice about your example, none of these things can exist without the others.
The difficulty with large teams is sharing up-to-date knowledge, not files or lines, thanks git. Maybe having everyone on the same single file would be a damn good thing after all?
No, large files are not a problem in clj or other tongues. Unit<->file is a totally javartificial concept that helps compilers, not men. Split the fg buffer.
In addition to the answers others have given, here are two more.
It may be that some files are long because in Clojure it's most straightforward to use one file per namespace, so that if you want all of those definitions in the same namespace, it's easier to put them in one file. One reason to want definitions to reside in the same namespace is given in #2.
The Clojure compiler won't allow certain kinds of cyclic dependencies between namespaces (other cyclic dependencies between namespaces are fine). One way to avoid an illegal cyclic dependency is to put the interdependent definitions into the same namespace. If you do that, it might make sense to pull other definitions that belong with the problematic ones into the single namespace as well. See #1 for the rest of this answer.
(My own taste is for several smaller files, although not as small as many Java class files. Also: Code is usually not as self-documenting as its author thinks. This may hold even when the author and the person reading the code later are the same person.)
I'm relatively new to Clojure but I've noticed many projects implement a "core" namespace; e.g. foobar.core seems to be a very common pattern. What's the history behind this and why is it the de facto standard?
It's just a generic name that represents the "core" functionality of something. The core functions of Clojure itself define a general purpose programming language, with very generic functions you might need in many different problem domains. Functions which are only applicable to a specific problem domain go into domain-specific namespaces, such as clojure.string or clojure.set.
Tools like Leiningen can't know a priori what sort of a program you're trying to write, so they set you up with a foo/core namespace by default, modelled after clojure.core -- clojure.core is the core functionality of Clojure, so foo.core is the core functionality of foo.
If you can give your core namespace a less generic name, or just use foo/core to kick off bootstrapping (such as by holding your -main function), that's encouraged, so that the bulk of your code will reside in more semantically meaningful namespaces instead. This helps you find the specific code you want to work on later.
I'm working on a web app using Clojure, Ring, Composure, and Fleet. I like the flexibility of Fleet and I find the syntax of its template files easy to read and intuitive. However, the documentation is sparse and I'm having difficulty understanding the use of macro, "fleet-ns" which produces namespaced functions for .fleet files in a directory tree.
In particular, the README.md file makes this statement about the production of these functions:
— Several functions will be created for each file. E.g. file posts.html.fleet will
produce 3 functions: posts, posts-html and posts-html-fleet.
I can't find any explanation of why there are three functions, what they each is used for, or what their differences are.
The examples I've found by search have been fragmentary, incomplete or obscured by other issues.
Overall my feeling is that the adoption of this excellently conceived package is being hampered by the lack of documentation. I am inclined to improve if I can figure out a bit more about the way Fleet works.
Any help, pointers, or canonical examples appreciated.
Indeed documentation is scarce. Maybe you can use enlive instead. There are plenty of examples available on the web.
You can also read (if you haven't already) the following:
http://cleancode.se/2011/01/04/getting-started-with-moustache-and-enlive.html
and a very nice paper by Glenn Vanderburg:
http://steve.vinoski.net/pdf/IC-Clojure_Templating_Libraries_Fleet_and_Enlive.pdf
Which of the following makes sense when dividing up my Clojure application into immutable parts?
Separate into different name-spaces the mutable/immutable parts
Add prefixes to defns which have side effects
Use the Clojure "doc" to explain this
Mix and match as you wish
I need to know this as I have a Clojure application which talks to databases, application servers and a stateful web framework, so I want my application to be as easy to maintain / read as possible
Some techniques that have worked for me:
Divide your namespaces and files by module/purpose rather than anything else. This makes more logical sense and helps you keep your design and dependencies clean.
Use "!" to indicate functions that have side effects, e.g. "swap!". Usually you should avoid side effects as much as possible, so it's a bit of a design smell if you see this happening too often
Try to avoid any mutable state in your library / utility functions. Not only does this usually give you a better API design, it's also much easier to test....
Keep application-specific mutable state to a small number of top-level defines. It's possible for example to use just a single top-level ref to an immutable map to store all your mutable data
It's helpful to document with examples that you can cut and paste into the REPL so that you can test things quickly or customise to a more complex use case. Again this is much easier if everything is pure.
Here is what my approach would be:
Don't divide the namespaces according to mutability/immutability unless you are writing a collections library or something similar. Use namespaces to indicate the logical partitions of your code, like ui, core, util etc.
By default keep all functions pure and hence do not use any prefix by default. State should be generally stored in refs and atoms defined as defs. Use names that indicate the satefullness, like userNameStore.
Document everything, all functions and vars. Or at least the public ones.
Mix and match but do not do so on an ad-hoc basis. Clearly structure your code so that the mutable state is limited and is well focussed.
I can understand the use for one level of namespaces. But 3 levels of namespaces. Looks insane. Is there any practical use for that? Or is it just a misconception?
Hierarchical namespaces do have a use in that they allow progressively more refined definitions. Certainly a single provider may produce two classes with the same name. Often the first level is occupied by the company name, the second specifies the product, the third (and possibly more) my provide the domain.
There are also other uses of namespace segregation. One popular situation is placing the base classes for a factory pattern in its own namespace and then derived factories in their own namespaces by provider. E.g. System.Data, System.Data.SqlClient and System.Data.OleDbClient.
Obviously it's a matter of opinion. But it really boils down to organization. For example, I have a project which has a plugin api that has functions/objects which look something like this:
plugins::v1::function
When 2.0 is rolled out they will be put into the v2 sub-namespace. I plan to only deprecate but never remove v1 members which should nicely support backwards compatibility in the future. This is just one example of "sane" usage. I imagine some people will differ, but like I said, it's a matter of opinion.
Big codebases will need it. Look at boost for an example. I don't think anyone would call the boost code 'insane'.
If you consider the fact that at any one level of a hierarchy, people can only comprehend somewhere very roughly on the order of 10 items, then two levels only gives you 100 maximum. A sufficiently big project is going to need more, so can easily end up 3 levels deep.
I work on XXX application in my company yyy, and I am writing a GUI subsystem. So I use yyy::xxx::gui as my namespace.
You can easily find yourself in a situation when you need more than one level. For example, your company has a giant namespace for all of its code to separate it from third party code, and you are writing a library which you want to put in its own namespace. Generally, whenever you have a very large and complex system, which is broken down hierarchically, it is reasonable to use several namespace levels.
It depends on your needs and programming style. But one of the benefits of namespace is to help partition name space (hence the name). With a single namespace, as your project is increases in size and complexity, so does the likelihood of name-collision.
If you're writing code that's meant to be shared or reused, this becomes even more important.
I agree for applications. Most people that use multiple levels of namespaces (in my experience) come from a Java or .NET background where the noise is significantly less. I find that good class prefixes can take the place of multiple levels of namespaces.
But I have seen good use of multiple namespace levels in boost (and other libraries). Everything is in the boost namespace, but libraries are allowed (encouraged?) to be in their own namespace. For example - boost::this_thread namespace. It allows things like...
boost::this_thread::get_id()
boost::this_thread::interruption_requested()
"this_thread" is just a namespace for a collection of free functions. You could do the same thing with a class and static functions (i.e. the Java way of defining a free function), but why do something unnatural when the language has a natural way of doing it?
Just look at the .Net base class library to see a namespace hierarchy put to good use. It goes four or five levels deep in a few places, but mostly it's just two or three, and the organization is very nice for finding things.
The bigger the codebase the bigger the need for hierarchical namespaces. As your project gets bigger and bigger you find you need to break it out in ways to make it easier to find stuff.
For instance we currently use a 2 level hierarchy. However some of the bigger portions we are now talking about breaking them out into 3 levels.