Type system in Clojure - clojure

Is the "programming to abstractions" principle in Clojure the same as duck typing? If not, what are the differences?
Here is a quote from http://www.braveclojure.com/core-functions-in-depth/:
The reason is that Clojure defines map and reduce functions in terms of the
sequence abstraction, not in terms of specific data structures. As
long as a data structure responds to the core sequence operations (the
functions first, rest, and cons, which we’ll look at more closely in a
moment), it will work with map, reduce, and oodles of other sequence
functions for free. This is what Clojurists mean by programming to
abstractions, and it’s a central tenet of Clojure philosophy.
I think of abstractions as named collections of operations. If you can
perform all of an abstraction’s operations on an object, then that
object is an instance of the abstraction. I think this way even
outside of programming. For example, the battery abstraction includes
the operation “connect a conducting medium to its anode and cathode,”
and the operation’s output is electrical current. It doesn’t matter if
the battery is made out of lithium or out of potatoes. It’s a battery
as long as it responds to the set of operations that define battery.
Data types are identified to be part of the abstract class by behaviour ("responds to"). Isn't this the essence of duck typing? Thanks for input.

Data types are identified to be part of the abstract class by behaviour ("responds to").
They're not, though. On the JVM, types can only be part of an interface if they explicitly state that they implement the interface, and then implement its methods. Merely implementing appropriately-named methods is not enough, as it is in, say, Python, a typical duck-typing language.
What's written is not exactly wrong, but it requires a bit of a specific viewpoint to interpret it as correct: you must realize that when the author writes,
As long as a data structure responds to the core sequence operations...
What is meant is that the type must implement the core sequence interfaces and their methods. In a way, merely exposing a function named first is not enough to "respond" to the same-named core sequence operation: the type must also implement the right interface in order to "respond". It's a weird way to write things in a VM that's not framed in terms of responding to messages, and requires some expertise and squinting to find a correct meaning in, but it's a reasonable simplification for beginners, who don't need to know about the details yet...unless they are inclined to ask Stack Overflow questions about duck typing!

Is the "programming to abstractions" principle in Clojure the same as
duck typing?
No.
Clojure is defined in terms of Extensible Abstractions.
These are Java interfaces ...
... which are used most prominently to define the core data
structures.
For example, the sequence abstraction is defined by clojure.lang.ISeq:
public interface ISeq extends IPersistentCollection {
Object first();
ISeq next();
ISeq more();
ISeq cons(Object o);
}
Any class that implements ISeq is accepted by Clojure as a sequence (whether is behaves properly is another matter). For example, lists and lazy sequences do so, and are treated impartially as sequences. Contrast this with classic Lisp, where a different set of functions apply to each.
And we have several different implementations of vectors
the core clojure vector type;
rrb vectors;
fast small vectors.
I could go on. (Actually, I can't. I don't know enough!)

Related

What are "abstractions"?

I've been reading Stroustrup's "The C++ Programming Language" and he mentions "abstractions" a lot:
Many of the most flexible, efficient, and useful abstractions involve the parameterization of types (classes) and algorithms (functions) with other types and algorithms
and
C++ is a language for developing and using elegant and efficient abstractions.
Is this in any way related to abstract classes in C++? Or with using polymorphism, inheritance, or templates?
Could someone give an example please?
abstraction (n) - the quality of dealing with ideas rather than events
— source: Oxford English Dictionary
Stroustrup is not referring to abstract classes or other specific ideas in programming. Rather, he is referring to the word abstraction itself.
Abstractions are mental helpers. They help us think in "theory" rather than direct application. Mathematics is the art of abstraction. Programming is the art of applied abstractions.
Abstractions help us form mental models, such as hierarchies, to help us think of things. Polymorphism is possible because of abstractions. Let's take a look at an example.
Example
I have an Oleksiy Dobrodum. I refer to it as an Oleksiy Dobrodum, I treat it like an Oleksiy Dobrodum, all it will ever be is an Oleksiy Dobrodum. Everything I do to this Oleksiy Dobrodum is specifically for it. We're now on the 1st level of abstraction, or the most specific we'll ever be when working with this Oleksiy Dobrodum.
Recently I acquired a Zach Latta, so now I have both an Oleksiy Dobrodum and a Zach Latta.
I could refer to them both individually, so as an Oleksiy Dobrodum and as a Zach Latta, but that would quickly grow redundant and prove to not be flexible. Instead, we can simply group Oleksiy Dobrodum and Zach Latta together and call them Humans. We have now achieve abstraction level 2. Instead of dealing with each person individually, we can refer to them as Humans. By doing this, we have abstracted away the "implementation", or the specific details of each person and have started focusing on the ideas, therefore we are now thinking in the abstract.
Of course we can abstract this further, but hopefully you're starting to get the idea behind abstractions. The key takeaway from this is that an abstraction hides the details (or implementation). By hiding the details in our Humans abstraction, we allow ourselves to speak in generalities. We'll talk briefly on how this applies in programming in the next section.
Applying Abstractions
Now that we've touched briefly on what an abstraction is, let's apply it. Polymorphism is possible because of abstractions. Following the model of the previous example, say we have the following two classes:
class OleksiyDobrodum
name = "Oleksiy Dobrodum"
smarts = :mega-smart
mood = :happy
favorite_candy = :chocolate
end
class ZachLatta
name = "Zach Latta"
smarts = :so-so
mood = :indifferent
hair_color = :brown
end
If I want to interact with an instance of ZachLatta I must refer to it specifically. The same goes for OleksiyDobrodum instances.
zach = new ZachLatta
print zach.name
oleksiy = new OleksiyDobrodum
print oleksiy.favorite_candy
If I create an abstract class called Human and have both OleksiyDobrodum and ZachLatta inherit from it, then I can abstract away the implementation of both classes and simply refer to both instances of them as Human.
class Human
name
smarts
mood
end
class OleksiyDobrodum < Human
name = "Oleksiy Dobrodum"
smarts = :mega-smart
mood = :happy
favorite_candy = :chocolate
end
class ZachLatta < Human
name = "Zach Latta"
smarts = :so-so
mood = :indifferent
hair_color = :brown
end
Our class diagram now looks like the following:
I could ramble on about implementation forever, but let's move on to our key takeaways.
Key Takeaways
abstractions are ideas, not specific events
to abstract something is to move away from its implementation and think about big ideas
abstractions can be used to organize code (and many other things) effectively
object-oriented programming is entirely dependent on the abstractions. see the above bullet point.
In generic programming, abstractions have a precise meaning, and are called "concepts". A concept is defined as follows:
A concept is a set of requirements consisting of valid expressions, associated types, invariants, and complexity guarantees. A type that satisfies the requirements is said to model the concept. A concept can extend the requirements of another concept, which is called refinement.
Valid Expressions are C++ expressions which must compile successfully for the objects involved in the expression to be considered models of the concept.
Associated Types are types that are related to the modeling type in that they participate in one or more of the valid expressions. Typically associated types can be accessed either through typedefs nested within a class definition for the modeling type, or they are accessed through a traits class.
Invariants are run-time characteristics of the objects that must always be true, that is, the functions involving the objects must preserve these characteristics. The invariants often take the form of pre-conditions and post-conditions.
Complexity Guarantees are maximum limits on how long the execution of one of the valid expressions will take, or how much of various resources its computation will use.
The concepts used in the C++ Standard Library are documented at the SGI STL site.
Implementing a concept into real code can be done is several ways. The classical OOP approach is to write an abstract base class providing the valid expressions and associated types. The concrete derived classes then provide the invariants and the complexity guarantees. For templates, the valid expressions are more implicit and only checked after instantiation. Template implementing concepts are a form of duck typing: if it looks like a duck, quacks like a duck, ....
The C++0x development process devoted a lot of effort to make concepts directly expressible into code, but it was not incorporated into the C++11 Standard. However, a Concepts Lite version is likely going to appear into the next C++14 Standard.
Yes it is related to the abstract classes in c++ and it's not limited to that context, he explained in a generic way saying that c++ has a full support of abstraction.
For Example: In C++, we can use the class types or function calls in other types, For example a function call can have a class type/function as a parameter, both the function and a class refers a form abstraction-(here abstraction refers to the hiding the definition of function or a class from the user)

Clojure: Perlis vs Protocols/Records [soft, philosophical]

Context:
(A) "It is better to have 100 functions operate on one data structure than 10 functions on 10 data structures." —Alan Perlis
(B) Clojure has defProtocol, defRecord, defType
Question:
is there some style of programming Clojure that gets the benefits of both?
(B) has the advantage of avoiding type errors.
(A) has the advantage of avoiding duplicate code.
Thanks
PS: I would love to hear constructive criticism on why I'm being downvoted + how to restructure the question to make it productive.
I am not sure how you can co-relate the (A) and (B).
(A) is about having consistency i.e if you use same data structure to represent your data (for ex: a user info stored in a map) across various layers of your application then it would make things consistent. If you use many data structure to represent the same info then you will have to write code to transform the structure from one form to another form and also the various functions which work on different structure will not be composable as they expect different data structure.
(B) This is about the various constructs in Clojure.
defprotocol : This is not about data structure rather it is about contract/interface i.e a particular type implements a contract and the type can be used in any context where the consumer function require the passed type to implement a contract. Ex: any type that can have can be printed to console (or other writable string) will implement the print contract/protocol.
defrecord : To create maps but with some additional interfaces implemented in a default way.
deftype: A low level construct to create types and hence you will have to write a lot of code for this. 99% of time you wont need to use this.
The way to reconcile this is to think "abstractions" rather than "data types". Or to paraphrase Alan Perlis:
"It is better to have 100 functions operate on one abstraction than
10 functions on 10 abstractions."
So the Clojure way is to:
Define your abstractions in a simple, minimal way (using defprotocol)
Write functions against this abstraction
Define concrete types that implement the abstraction using defprotocol, deftype etc. (or use extend-protocol to extend the protocol to existing Java classes if you like)

Inheritance & virtual functions Vs Generic Programming

I need to Understand that whether really Inheritance & virtual functions not necessary in C++ and one can achieve everything using Generic programming. This came from Alexander Stepanov and Lecture I was watching is Alexander Stepanov: STL and Its Design Principles
I always like to think of templates and inheritance as two orthogonal concepts, in the very literal sense: To me, inheritance goes "vertically", starting with a base class at the top and going "down" to more and more derived classes. Every (publically) derived class is a base class in terms of its interface: A poodle is a dog is an animal.
On the other hand, templates go "horizontal": Each instance of a template has the same formal code content, but two distinct instances are entirely separate, unrelated pieces that run in "parallel" and don't see each other. Sorting an array of integers is formally the same as sorting an array of floats, but an array of integers is not at all related to an array of floats.
Since these two concepts are entirely orthogonal, their application is, too. Sure, you can contrive situations in which you could replace one by another, but when done idiomatically, both template (generic) programming and inheritance (polymorphic) programming are independent techniques that both have their place.
Inheritance is about making an abstract concept more and more concrete by adding details. Generic programming is essentially code generation.
As my favourite example, let me mention how the two technologies come together beautifully in a popular implementation of type erasure: A single handler class holds a private polymorphic pointer-to-base of an abstract container class, and the concrete, derived container class is determined a templated type-deducing constructor. We use template code generation to create an arbitrary family of derived classes:
// internal helper base
class TEBase { /* ... */ };
// internal helper derived TEMPLATE class (unbounded family!)
template <typename T> class TEImpl : public TEBase { /* ... */ }
// single public interface class
class TE
{
TEBase * impl;
public:
// "infinitely many" constructors:
template <typename T> TE(const T & x) : impl(new TEImpl<T>(x)) { }
// ...
};
They serve different purpose. Generic programming (at least in C++) is about compile time polymorphisim, and virtual functions about run-time polymorphisim.
If the choice of the concrete type depends on user's input, you really need runtime polymorphisim - templates won't help you.
Polymorphism (i.e. dynamic binding) is crucial for decisions that are based on runtime data. Generic data structures are great but they are limited.
Example: Consider an event handler for a discrete event simulator: It is very cheap (in terms of programming effort) to implement this with a pure virtual function, but is verbose and quite inflexible if done purely with templated classes.
As rule of thumb: If you find yourself switching (or if-else-ing) on the value of some input object, and performing different actions depending on its value, there might exist a better (in the sense of maintainability) solution with dynamic binding.
Some time ago I thought about a similar question and I can only dream about giving you such a great answer I received. Perhaps this is helpful: interface paradigm performance (dynamic binding vs. generic programming)
It seems like a very academic question, like with most things in life there are lots of ways to do things and in the case of C++ you have a number of ways to solve things. There is no need to have an XOR attitude to things.
In the ideal world, you would use templates for static polymorphism to give you the best possible performance in instances where the type is not determined by user input.
The reality is that templates force most of your code into headers and this has the consequence of exploding your compile times.
I have done some heavy generic programming leveraging static polymorphism to implement a generic RPC library (https://github.com/bytemaster/mace (rpc_static_poly branch) ). In this instance the protocol (JSON-RPC, the transport (TCP/UDP/Stream/etc), and the types) are all known at compile time so there is no reason to do a vtable dispatch... or is there?
When I run the code through the pre-processor for a single.cpp it results in 250,000 lines and takes 30+ seconds to compile a single object file. I implemented 'identical' functionality in Java and C# and it compiles in about a second.
Almost every stl or boost header you include adds thousands or 10's of thousands of lines of code that must be processed per-object-file, most of it redundant.
Do compile times matter? In most cases they have a more significant impact on the final product than 'maximally optimized vtable elimination'. The reason being that every 'bug' requires a 'try fix, compile, test' cycle and if each cycle takes 30+ seconds development slows to a crawl (note motivation for Google's go language).
After spending a few days with java and C# I decided that I needed to 're-think' my approach to C++. There is no reason a C++ program should compile much slower than the underlying C that would implement the same function.
I now opt for runtime polymorphism unless profiling shows that the bottleneck is in vtable dispatches. I now use templates to provide 'just-in-time' polymorphism and type-safe interface on top of the underlying object which deals with 'void*' or an abstract base class. In this way users need not derive from my 'interfaces' and still have the 'feel' of generic programming, but they get the benefit of fast compile times. If performance becomes an issue then the generic code can be replaced with static polymorphism.
The results are dramatic, compile times have fallen from 30+ seconds to about a second. The post-preprocessor source code is now a couple thousand lines instead of 250,000 lines.
On the other side of the discussion, I was developing a library of 'drivers' for a set of similar but slightly different embedded devices. In this instance the embedded device had little room for 'extra code' and no use for 'vtable' dispatch. With C our only option was 'separate object files' or runtime 'polymorphism' via function pointers. Using generic programming and static polymorphism we were able to create maintainable software that ran faster than anything we could produce in C.

What is a fat Interface?

Ciao, I work in movie industry to simulate and apply studio effects. May I ask what is a fat interface as I hear someone online around here stating it ?
Edit: It is here said by Nicol Bolas (very good pointer I believe)
fat interface - an interface with more member functions and friends than are logically necessary. TC++PL 24.4.3
source
very simple explanation is here:
The Fat Interface approach [...]: in addition to the core services (that are part of the thin interface) it also offers a rich set of services that satisfy common needs of client code. Clearly, with such classes the amount of client code that needs to be written is smaller.
When should we use fat interfaces? If a class is expected to have a long life span or if a class is expected to have many clients it should offer a fat interface.
Maxim quotes Stroustrup's glossary:
fat interface - an interface with more member functions and friends than are logically necessary. TC++PL 24.4.3
Maxim provides no explanation, and other existing answers to this question misinterpret the above - or sans the Stroustrup quote the term itself - as meaning an interface with an arguably excessive number of members. It's not.
It's actually not about the number of members, but whether the members make sense for all the implementations.
That subtle aspect that doesn't come through very clearly in Stroustrup's glossary, but at least in the old version of TC++PL I have - is clear where the term's used in the text. Once you understand the difference, the glossary entry is clearly consistent with it, but "more member functions and friends than are logically necessary" is a test that should be applied from the perspective of each of the implementations of a logical interface. (My understanding's also supported by Wikipedia, for whatever that's worth ;-o.)
Specifically when you have an interface over several implementations, and some of the interface actions are only meaningful for some of the implementations, then you have a fat interface in which you can ask the active implementation to do something that it has no hope of doing, and you have to complicate the interface with some "not supported" discovery or reporting, which soon adds up to make it harder to write reliable client code.
For example, if you have a Shape base class and derived Circle and Square classes, and contemplate adding a double get_radius() const member: you could do so and have it throw or return some sentinel value like NaN or -1 if called on a Square - you'd then have a fat interface.
"Uncle Bob" puts a different emphasis on it below (boldfacing mine) in the context of the Interface Segregation Principle (ISP) (a SOLID principle that says to avoid fat interfaces):
[ISP] deals with the disadvantages of “fat” interfaces. Classes that have “fat” interfaces are classes whose interfaces are not cohesive. In other words, the interfaces of the class can be broken up into groups of member functions. Each group serves a different set of clients. Thus some clients use one group of member functions, and other clients use the other groups.
This implies you could have e.g. virtual functions that all derived classes do implementation with non-noop behaviours, but still consider the interface "fat" if typically any given client using that interface would only be interested in one group of its functions. For example: if a string class provided regexp functions and 95% of client code never used any of those, and especially if the 5% that did didn't tend to use the non-regexp string functions, then you should probably separate the regexp functionality from the normal textual string functionality. In that case though, there's a clear distinction in member function functionality that forms 2 groups, and when you were writing your code you'd have a clear idea whether you wanted regexp functionality or normal text-handling functionality. With the actual std::string class, although it has a lot of functions I'd argue that there's no clear grouping of functions where it would be weird to evolve a need to use some functions (e.g. begin/end) after having initially needed only say insert/erase. I don't personally consider the interface "fat", even though it's huge.
Of course, such an evocative term will have been picked up by other people to mean whatever they think it should mean, so it's no surprise that the web contains examples of the simpler larger-than-necessary-interface usage, as evidenced by the link in relaxxx's answer, but I suspect that's more people guessing at a meaning than "educated" about prior usage in Computing Science literature....
An interface with more methods or friends than is really necessary.

Design methods for multiple serialization targets/formats (not versions)

Whether as members, whether perhaps static, separate namespaces, via friend-s, via overloads even, or any other C++ language feature...
When facing the problem of supporting multiple/varying formats, maybe protocols or any other kind of targets for your types, what was the most flexible and maintainable approach?
Were there any conventions or clear cut winners?
A brief note why a particular approach helped would be great.
Thanks.
[ ProtoBufs like suggestions should not cut it for an upvote, no matter how flexible that particular impl might be :) ]
Reading through the already posted responses, I can only agree with a middle-tier approach.
Basically, in your original problem you have 2 distinct hierarchies:
n classes
m protocols
The naive use of a Visitor pattern (as much as I like it) will only lead to n*m methods... which is really gross and a gateway towards maintenance nightmare. I suppose you already noted it otherwise you would not ask!
The "obvious" target approach is to go for a n+m solution, where the 2 hierarchies are clearly separated. This of course introduces a middle-tier.
The idea is thus ObjectA -> MiddleTier -> Protocol1.
Basically, that's what Protocol Buffers does, though their problematic is different (from one language to another via a protocol).
It may be quite difficult to work out the middle-tier:
Performance issues: a "translation" phase add some overhead, and here you go from 1 to 2, this can be mitigated though, but you will have to work on it.
Compatibility issues: some protocols do not support recursion for example (xml or json do, edifact does not), so you may have to settle for a least-common approach or to work out ways of emulating such behaviors.
Personally, I would go for "reimplementing" the JSON language (which is extremely simple) into a C++ hierarchy:
int
strings
lists
dictionaries
Applying the Composite pattern to combine them.
Of course, that is the first step only. Now you have a framework, but you don't have your messages.
You should be able to specify a message in term of primitives (and really think about versionning right now, it's too late once you need another version). Note that the two approaches are valid:
In-code specification: your message is composed of primitives / other messages
Using a code generation script: this seems overkill there, but... for the sake of completion I thought I would mention it as I don't know how many messages you really need :)
On to the implementation:
Herb Sutter and Andrei Alexandrescu said in their C++ Coding Standards
Prefer non-member non-friend methods
This applies really well to the MiddleTier -> Protocol step > creates a Protocol1 class and then you can have:
Protocol1 myProtocol;
myProtocol << myMiddleTierMessage;
The use of operator<< for this kind of operation is well-known and very common. Furthermore, it gives you a very flexible approach: not all messages are required to implement all protocols.
The drawback is that it won't work for a dynamic choice of the output protocol. In this case, you might want to use a more flexible approach. After having tried various solutions, I settled for using a Strategy pattern with compile-time registration.
The idea is that I use a Singleton which holds a number of Functor objects. Each object is registered (in this case) for a particular Message - Protocol combination. This works pretty well in this situation.
Finally, for the BOM -> MiddleTier step, I would say that a particular instance of a Message should know how to build itself and should require the necessary objects as part of its constructor.
That of course only works if your messages are quite simple and may only be built from few combination of objects. If not, you might want a relatively empty constructor and various setters, but the first approach is usually sufficient.
Putting it all together.
// 1 - Your BOM
class Foo {};
class Bar {};
// 2 - Message class: GetUp
class GetUp
{
typedef enum {} State;
State m_state;
};
// 3 - Protocl class: SuperProt
class SuperProt: public Protocol
{
};
// 4 - GetUp to SuperProt serializer
class GetUp2SuperProt: public Serializer
{
};
// 5 - Let's use it
Foo foo;
Bar bar;
SuperProt sp;
GetUp getUp = GetUp(foo,bar);
MyMessage2ProtBase.serialize(sp, getUp); // use GetUp2SuperProt inside
If you need many output formats for many classes, I would try to make it a n + m problem instead of an n * m problem. The first way I come to think of is to have the classes reductible to some kind of dictionary, and then have a method to serlalize those dictionarys to each output formats.
Assuming you have full access to the classes that must be serialized. You need to add some form of reflection to the classes (probably including an abstract factory). There are two ways to do this: 1) a common base class or 2) a "traits" struct. Then you can write your encoders/decoders in relation to the base class/traits struct.
Alternatively, you could require that the class provide a function to export itself to a container of boost::any and provide a constructor that takes a boost::any container as its only parameter. It should be simple to write a serialization function to many different formats if your source data is stored in a map of boost::any objects.
That's two ways I might approach this. It would depend highly on the similarity of the classes to be serialized and on the diversity of target formats which of the above methods I would choose.
I used OpenH323 (famous enough for VoIP developers) library for long enough term to build number of application related to VoIP starting from low density answering machine and up to 32xE1 border controller. Of course it had major rework so I knew almost anything about this library that days.
Inside this library was tool (ASNparser) which converted ASN.1 definitions into container classes. Also there was framework which allowed serialization / de-serialization of these containers using higher layer abstractions. Note they are auto-generated. They supported several encoding protocols (BER,PER,XER) for ASN.1 with very complex ASN sntax and good-enough performance.
What was nice?
Auto-generated container classes which were suitable enough for clear logic implementation.
I managed to rework whole container layer under ASN objects hierarchy without almost any modification for upper layers.
It was relatively easy to do refactoring (performance) for debug features of that ASN classes (I understand, authors didn't intended to expect 20xE1 calls signalling to be logged online).
What was not suitable?
Non-STL library with lazy copy under this. Refactored by speed but I'd like to have STL compatibility there (at least that time).
You can find Wiki page of all the project here. You should focus only on PTlib component, ASN parser sources, ASN classes hierarchy / encoding / decoding policies hierarchy.
By the way,look around "Bridge" design pattern, it might be useful.
Feel free to comment questions if something seen to be strange / not enough / not that you requested actuall.