Clojure: Perlis vs Protocols/Records [soft, philosophical]

Clojure: Perlis vs Protocols/Records [soft, philosophical] - clojure

Context:
(A) "It is better to have 100 functions operate on one data structure than 10 functions on 10 data structures." —Alan Perlis
(B) Clojure has defProtocol, defRecord, defType
Question:
is there some style of programming Clojure that gets the benefits of both?
(B) has the advantage of avoiding type errors.
(A) has the advantage of avoiding duplicate code.
Thanks
PS: I would love to hear constructive criticism on why I'm being downvoted + how to restructure the question to make it productive.

I am not sure how you can co-relate the (A) and (B).
(A) is about having consistency i.e if you use same data structure to represent your data (for ex: a user info stored in a map) across various layers of your application then it would make things consistent. If you use many data structure to represent the same info then you will have to write code to transform the structure from one form to another form and also the various functions which work on different structure will not be composable as they expect different data structure.
(B) This is about the various constructs in Clojure.
defprotocol : This is not about data structure rather it is about contract/interface i.e a particular type implements a contract and the type can be used in any context where the consumer function require the passed type to implement a contract. Ex: any type that can have can be printed to console (or other writable string) will implement the print contract/protocol.
defrecord : To create maps but with some additional interfaces implemented in a default way.
deftype: A low level construct to create types and hence you will have to write a lot of code for this. 99% of time you wont need to use this.

The way to reconcile this is to think "abstractions" rather than "data types". Or to paraphrase Alan Perlis:
"It is better to have 100 functions operate on one abstraction than
10 functions on 10 abstractions."
So the Clojure way is to:
Define your abstractions in a simple, minimal way (using defprotocol)
Write functions against this abstraction
Define concrete types that implement the abstraction using defprotocol, deftype etc. (or use extend-protocol to extend the protocol to existing Java classes if you like)

Related

Type system in Clojure

Is the "programming to abstractions" principle in Clojure the same as duck typing? If not, what are the differences?
Here is a quote from http://www.braveclojure.com/core-functions-in-depth/:
The reason is that Clojure defines map and reduce functions in terms of the
sequence abstraction, not in terms of specific data structures. As
long as a data structure responds to the core sequence operations (the
functions first, rest, and cons, which we’ll look at more closely in a
moment), it will work with map, reduce, and oodles of other sequence
functions for free. This is what Clojurists mean by programming to
abstractions, and it’s a central tenet of Clojure philosophy.
I think of abstractions as named collections of operations. If you can
perform all of an abstraction’s operations on an object, then that
object is an instance of the abstraction. I think this way even
outside of programming. For example, the battery abstraction includes
the operation “connect a conducting medium to its anode and cathode,”
and the operation’s output is electrical current. It doesn’t matter if
the battery is made out of lithium or out of potatoes. It’s a battery
as long as it responds to the set of operations that define battery.
Data types are identified to be part of the abstract class by behaviour ("responds to"). Isn't this the essence of duck typing? Thanks for input.

Data types are identified to be part of the abstract class by behaviour ("responds to").
They're not, though. On the JVM, types can only be part of an interface if they explicitly state that they implement the interface, and then implement its methods. Merely implementing appropriately-named methods is not enough, as it is in, say, Python, a typical duck-typing language.
What's written is not exactly wrong, but it requires a bit of a specific viewpoint to interpret it as correct: you must realize that when the author writes,
As long as a data structure responds to the core sequence operations...
What is meant is that the type must implement the core sequence interfaces and their methods. In a way, merely exposing a function named first is not enough to "respond" to the same-named core sequence operation: the type must also implement the right interface in order to "respond". It's a weird way to write things in a VM that's not framed in terms of responding to messages, and requires some expertise and squinting to find a correct meaning in, but it's a reasonable simplification for beginners, who don't need to know about the details yet...unless they are inclined to ask Stack Overflow questions about duck typing!

Is the "programming to abstractions" principle in Clojure the same as
duck typing?
No.
Clojure is defined in terms of Extensible Abstractions.
These are Java interfaces ...
... which are used most prominently to define the core data
structures.
For example, the sequence abstraction is defined by clojure.lang.ISeq:
public interface ISeq extends IPersistentCollection {
Object first();
ISeq next();
ISeq more();
ISeq cons(Object o);
}
Any class that implements ISeq is accepted by Clojure as a sequence (whether is behaves properly is another matter). For example, lists and lazy sequences do so, and are treated impartially as sequences. Contrast this with classic Lisp, where a different set of functions apply to each.
And we have several different implementations of vectors
the core clojure vector type;
rrb vectors;
fast small vectors.
I could go on. (Actually, I can't. I don't know enough!)

framework/library for property-tree-like data structure with generic get/set-implementation?

I'm looking for a data structure which behaves similar to boost::property_tree but (optionally) leaves the get/set implementation for each value item to the developer.
You should be able to do something like this:
std::function<int(void)> f_foo = ...;
my_property_tree tree;
tree.register<int>("some.path.to.key", f_foo);
auto v1 = tree.get<int>("some.path.to.key"); // <-- calls f_foo
auto v2 = tree.get<int>("some.other.path"); // <-- some fallback or throws exception
I guess you could abuse property_tree for this but I haven't looked into the implementation yet and I would have a bad feeling about this unless I knew that this is an intended use case.
Writing a class that handles requests like val = tree.get("some.path.to.key") by calling a provided function doesn't look too hard in the first place but I can imagine a lot of special cases which would make this quite a bulky library.
Some extra features might be:
subtree-handling: not only handle terminal keys but forward certain subtrees to separate implementations. E.g.
tree.register("some.path.config", some_handler);
// calls some_handler.get<int>("network.hostname")
v = tree.get<int>("some.path.config.network.hostname");
search among values / keys
automatic type casting (like in boost::property_tree)
"path overloading", e.g. defaulting to a property_tree-implementation for paths without registered callback.
Is there a library that comes close to what I'm looking for? Has anyone made experiences with using boost::property_tree for this purpose? (E.g. by subclassing or putting special objects into the tree like described here)

After years of coding my own container classes I ended up just adopting QVariantMap. This way it pretty much behaves (and is as flexible as) python. Just one interface. Not for performance code though.
If you care to know, I really caved in for Qt as my de facto STL because:
Industry standard - used even in avionics and satellite software
It has been around for decades with little interface change (think about long term support)
It has excellent performance, awesome documentation and enormous user base.
Extensive feature set, way beyond the STL

Would an std::map do the job you are interested in?
Have you tried this approach?
I don't quite understand what you are trying to do. So please provide a domain example.
Cheers.

I have some home-cooked code that lets you register custom callbacks for each type in GitHub. It is quite basic and still missing most of the features you would like to have. I'm working on the second version, though. I'm finishing a helper structure that will do most of the job of making callbacks. Tell me if you're interested. Also, you could implement some of those features yourself, as the code to register callbacks is already done. It shouldn't be so difficult.

Using only provided data structures:
First, getters and setters are not native features to c++ you need to call the method one way or another. To make such behaviour occur you can overload assignment operator. I assume you also want to store POD data in your data structure as well.
So without knowing the type of the data you're "get"ting, the only option I can think of is to use boost::variant. But still, you have some overloading to do, and you need at least one assignment.
You can check out the documentation. It's pretty straight-forward and easy to understand.
http://www.boost.org/doc/libs/1_61_0/doc/html/variant/tutorial.html
Making your own data structures:
Alternatively, as Dani mentioned, you can come up with your own implementation and keep a register of overloaded methods and so on.
Best

Add constructor to deftype created class

For the purposes of interoperability with Java, I need a class that has a nullary constructor that performs initialization.
Objects of this class need to have something resembling mutable java fields (namely, the object represents the backend of a game, and needs to keep game state).
deftype does everything I want to do except provide a nullary constructor (since I'm creating a class with fields).
I don't need the fields to be publicly readable, so I can think of 4 solutions:
Use gen-class; I don't want to do this if I can avoid it.
Somehow encoding private member variables outside of the knowledge of deftype; I've been told this can't be done.
Writing a modified deftype that also creates a nullary constructor; frankly I don't know clojure well enough for this.
Taking the class created by deftype and somehow adding a new constructor to it.
At the end of this, I need to have a Java class, since I will be handing it off to Java code that will be making a new object from the class.
Are any of the solutions I suggested (or any that I haven't thought of) other than using gen-class viable?

There's absolutely no shame in, where appropriate, writing a dash of Java if your Java interop requirements are simultaneously specific and unshakable. You could write a Java class with a single static factory method that returns an instance of the deftype class and that does whatever initialization/setup you need.
Alternatively, you can write a nullary factory function in Clojure, and call that directly from Java all day long.
In any case, neither deftype nor defrecord are intended to be (or will they ever be) fully-featured interop facilities. gen-class certainly comes the closest, which is why it's been recommended.

I'd suggest just writing the object in Java - for Java-like objects with mutable fields it will probably be more elegant, understandable and practical.
I've generally had pretty good results mixing Java and Clojure code in projects. This seems like one of those cases where this might be appropriate. The interoperability is so good that you barely have any extra complexity.
BTW - I'm assuming that you need a nullary constructor to meet the requirements of some persistence library or something similar? It seems like an odd requirement otherwise. If this is the case then you may find it makes sense to rethink your persistence strategy..... arbitrary restrictions like this always seem like a bit of a code smell to me.

Single-use class

In a project I am working on, we have several "disposable" classes. What I mean by disposable is that they are a class where you call some methods to set up the info, and you call what equates to a doit function. You doit once and throw them away. If you want to doit again, you have to create another instance of the class. The reason they're not reduced to single functions is that they must store state for after they doit for the user to get information about what happened and it seems to be not very clean to return a bunch of things through reference parameters. It's not a singleton but not a normal class either.
Is this a bad way to do things? Is there a better design pattern for this sort of thing? Or should I just give in and make the user pass in a boatload of reference parameters to return a bunch of things through?

What you describe is not a class (state + methods to alter it), but an algorithm (map input data to output data):
result_t do_it(parameters_t);
Why do you think you need a class for that?

Sounds like your class is basically a parameter block in a thin disguise.
There's nothing wrong with that IMO, and it's certainly better than a function with so many parameters it's hard to keep track of which is which.
It can also be a good idea when there's a lot of input parameters - several setup methods can set up a few of those at a time, so that the names of the setup functions give more clue as to which parameter is which. Also, you can cover different ways of setting up the same parameters using alternative setter functions - either overloads or with different names. You might even use a simple state-machine or flag system to ensure the correct setups are done.
However, it should really be possible to recycle your instances without having to delete and recreate. A "reset" method, perhaps.
As Konrad suggests, this is perhaps misleading. The reset method shouldn't be seen as a replacement for the constructor - it's the constructors job to put the object into a self-consistent initialised state, not the reset methods. Object should be self-consistent at all times.
Unless there's a reason for making cumulative-running-total-style do-it calls, the caller should never have to call reset explicitly - it should be built into the do-it call as the first step.
I still decided, on reflection, to strike that out - not so much because of Jalfs comment, but because of the hairs I had to split to argue the point ;-) - Basically, I figure I almost always have a reset method for this style of class, partly because my "tools" usually have multiple related kinds of "do it" (e.g. "insert", "search" and "delete" for a tree tool), and shared mode. The mode is just some input fields, in parameter block terms, but that doesn't mean I want to keep re-initializing. But just because this pattern happens a lot for me, doesn't mean it should be a point of principle.
I even have a name for these things (not limited to the single-operation case) - "tool" classes. A "tree_searching_tool" will be a class that searches (but doesn't contain) a tree, for example, though in practice I'd have a "tree_tool" that implements several tree-related operations.
Basically, even parameter blocks in C should ideally provide a kind of abstraction that gives it some order beyond being just a bunch of parameters. "Tool" is a (vague) abstraction. Classes are a major means of handling abstraction in C++.

I have used a similar design and wondered about this too. A fictive simplified example could look like this:
FileDownloader downloader(url);
downloader.download();
downloader.result(); // get the path to the downloaded file
To make it reusable I store it in a boost::scoped_ptr:
boost::scoped_ptr<FileDownloader> downloader;
// Download first file
downloader.reset(new FileDownloader(url1));
downloader->download();
// Download second file
downloader.reset(new FileDownloader(url2));
downloader->download();
To answer your question: I think it's ok. I have not found any problems with this design.

As far as I can tell you are describing a class that represents an algorithm. You configure the algorithm, then you run the algorithm and then you get the result of the algorithm. I see nothing wrong with putting those steps together in a class if the alternative is a function that takes 7 configuration parameters and 5 output references.
This structuring of code also has the advantage that you can split your algorithm into several steps and put them in separate private member functions. You can do that without a class too, but that can lead to the sub-functions having many parameters if the algorithm has a lot of state. In a class you can conveniently represent that state through member variables.
One thing you might want to look out for is that structuring your code like this could easily tempt you to use inheritance to share code among similar algorithms. If algorithm A defines a private helper function that algorithm B needs, it's easy to make that member function protected and then access that helper function by having class B derive from class A. It could also feel natural to define a third class C that contains the common code and then have A and B derive from C. As a rule of thumb, inheritance used only to share code in non-virtual methods is not the best way - it's inflexible, you end up having to take on the data members of the super class and you break the encapsulation of the super class. As a rule of thumb for that situation, prefer factoring the common code out of both classes without using inheritance. You can factor that code into a non-member function or you might factor it into a utility class that you then use without deriving from it.
YMMV - what is best depends on the specific situation. Factoring code into a common super class is the basis for the template method pattern, so when using virtual methods inheritance might be what you want.

Nothing especially wrong with the concept. You should try to set it up so that the objects in question can generally be auto-allocated vs having to be newed -- significant performance savings in most cases. And you probably shouldn't use the technique for highly performance-sensitive code unless you know your compiler generates it efficiently.

I disagree that the class you're describing "is not a normal class". It has state and it has behavior. You've pointed out that it has a relatively short lifespan, but that doesn't make it any less of a class.
Short-lived classes vs. functions with out-params:
I agree that your short-lived classes are probably a little more intuitive and easier to maintain than a function which takes many out-params (or 1 complex out-param). However, I suspect a function will perform slightly better, because you won't be taking the time to instantiate a new short-lived object. If it's a simple class, that performance difference is probably negligible. However, if you're talking about an extremely performance-intensive environment, it might be a consideration for you.
Short-lived classes: creating new vs. re-using instances:
There's plenty of examples where instances of classes are re-used: thread-pools, DB-connection pools (probably darn near any software construct ending in 'pool' :). In my experience, they seem to be used when instantiating the object is an expensive operation. Your small, short-lived classes don't sound like they're expensive to instantiate, so I wouldn't bother trying to re-use them. You may find that whatever pooling mechanism you implement, actually costs MORE (performance-wise) than simply instantiating new objects whenever needed.

Design methods for multiple serialization targets/formats (not versions)

Whether as members, whether perhaps static, separate namespaces, via friend-s, via overloads even, or any other C++ language feature...
When facing the problem of supporting multiple/varying formats, maybe protocols or any other kind of targets for your types, what was the most flexible and maintainable approach?
Were there any conventions or clear cut winners?
A brief note why a particular approach helped would be great.
Thanks.
[ ProtoBufs like suggestions should not cut it for an upvote, no matter how flexible that particular impl might be :) ]

Reading through the already posted responses, I can only agree with a middle-tier approach.
Basically, in your original problem you have 2 distinct hierarchies:
n classes
m protocols
The naive use of a Visitor pattern (as much as I like it) will only lead to n*m methods... which is really gross and a gateway towards maintenance nightmare. I suppose you already noted it otherwise you would not ask!
The "obvious" target approach is to go for a n+m solution, where the 2 hierarchies are clearly separated. This of course introduces a middle-tier.
The idea is thus ObjectA -> MiddleTier -> Protocol1.
Basically, that's what Protocol Buffers does, though their problematic is different (from one language to another via a protocol).
It may be quite difficult to work out the middle-tier:
Performance issues: a "translation" phase add some overhead, and here you go from 1 to 2, this can be mitigated though, but you will have to work on it.
Compatibility issues: some protocols do not support recursion for example (xml or json do, edifact does not), so you may have to settle for a least-common approach or to work out ways of emulating such behaviors.
Personally, I would go for "reimplementing" the JSON language (which is extremely simple) into a C++ hierarchy:
int
strings
lists
dictionaries
Applying the Composite pattern to combine them.
Of course, that is the first step only. Now you have a framework, but you don't have your messages.
You should be able to specify a message in term of primitives (and really think about versionning right now, it's too late once you need another version). Note that the two approaches are valid:
In-code specification: your message is composed of primitives / other messages
Using a code generation script: this seems overkill there, but... for the sake of completion I thought I would mention it as I don't know how many messages you really need :)
On to the implementation:
Herb Sutter and Andrei Alexandrescu said in their C++ Coding Standards
Prefer non-member non-friend methods
This applies really well to the MiddleTier -> Protocol step > creates a Protocol1 class and then you can have:
Protocol1 myProtocol;
myProtocol << myMiddleTierMessage;
The use of operator<< for this kind of operation is well-known and very common. Furthermore, it gives you a very flexible approach: not all messages are required to implement all protocols.
The drawback is that it won't work for a dynamic choice of the output protocol. In this case, you might want to use a more flexible approach. After having tried various solutions, I settled for using a Strategy pattern with compile-time registration.
The idea is that I use a Singleton which holds a number of Functor objects. Each object is registered (in this case) for a particular Message - Protocol combination. This works pretty well in this situation.
Finally, for the BOM -> MiddleTier step, I would say that a particular instance of a Message should know how to build itself and should require the necessary objects as part of its constructor.
That of course only works if your messages are quite simple and may only be built from few combination of objects. If not, you might want a relatively empty constructor and various setters, but the first approach is usually sufficient.
Putting it all together.
// 1 - Your BOM
class Foo {};
class Bar {};
// 2 - Message class: GetUp
class GetUp
{
typedef enum {} State;
State m_state;
};
// 3 - Protocl class: SuperProt
class SuperProt: public Protocol
{
};
// 4 - GetUp to SuperProt serializer
class GetUp2SuperProt: public Serializer
{
};
// 5 - Let's use it
Foo foo;
Bar bar;
SuperProt sp;
GetUp getUp = GetUp(foo,bar);
MyMessage2ProtBase.serialize(sp, getUp); // use GetUp2SuperProt inside

If you need many output formats for many classes, I would try to make it a n + m problem instead of an n * m problem. The first way I come to think of is to have the classes reductible to some kind of dictionary, and then have a method to serlalize those dictionarys to each output formats.

Assuming you have full access to the classes that must be serialized. You need to add some form of reflection to the classes (probably including an abstract factory). There are two ways to do this: 1) a common base class or 2) a "traits" struct. Then you can write your encoders/decoders in relation to the base class/traits struct.
Alternatively, you could require that the class provide a function to export itself to a container of boost::any and provide a constructor that takes a boost::any container as its only parameter. It should be simple to write a serialization function to many different formats if your source data is stored in a map of boost::any objects.
That's two ways I might approach this. It would depend highly on the similarity of the classes to be serialized and on the diversity of target formats which of the above methods I would choose.

I used OpenH323 (famous enough for VoIP developers) library for long enough term to build number of application related to VoIP starting from low density answering machine and up to 32xE1 border controller. Of course it had major rework so I knew almost anything about this library that days.
Inside this library was tool (ASNparser) which converted ASN.1 definitions into container classes. Also there was framework which allowed serialization / de-serialization of these containers using higher layer abstractions. Note they are auto-generated. They supported several encoding protocols (BER,PER,XER) for ASN.1 with very complex ASN sntax and good-enough performance.
What was nice?
Auto-generated container classes which were suitable enough for clear logic implementation.
I managed to rework whole container layer under ASN objects hierarchy without almost any modification for upper layers.
It was relatively easy to do refactoring (performance) for debug features of that ASN classes (I understand, authors didn't intended to expect 20xE1 calls signalling to be logged online).
What was not suitable?
Non-STL library with lazy copy under this. Refactored by speed but I'd like to have STL compatibility there (at least that time).
You can find Wiki page of all the project here. You should focus only on PTlib component, ASN parser sources, ASN classes hierarchy / encoding / decoding policies hierarchy.
By the way,look around "Bridge" design pattern, it might be useful.
Feel free to comment questions if something seen to be strange / not enough / not that you requested actuall.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js