Unit-testing in Procedural or Functional Programming Languages - unit-testing

I have asked a related question, but I did not get a satisfactory answer. So, perhaps I should ask it a different way.
How do large-scale C projects, like Perl or Ruby or even the Linux kernel, handle unit-testings? Or even in any functional language?
I am familiar with Dependency Injection and Abstract Factory for testings in OOP, but I don't see a scalable and manageable equivalence in non OOP. In C or Haskell, for example, there would be layers over layers of functions, higher ones implicitly calling the lower ones. How do I find seams to test just an unit of code instead of all its dependencies?
One way to avoid the need for seams all together is to keep the depths of call-dependency graph very low. Code horizontally rather than vertically, so to speak. Keep as much of the application logic as possible in the "leaf" functions; and make sure the "node" functions do no work other than plumbing the data to other node/leaf functions. Then, test only the "leaf" functions; leave the "node" functions out to integration tests. Is that approach effective?
The largest software today are still written in procedural languages. There must be some methodologies being employed that work. Could someone with experience with large-scale software in procedural languages with good unit-testings comment?

Functional languages have other constructs (than objects) for modularity, such as ML functors. "Dependency injection" is basically a glorified name for "abstracting over things" and has been used for ages in functional languages.
Testing, in all paradigms, should follow the specification boundaries. If you have an idea what a given piece of code (function, method, object...) is supposed to do, you should test against this specification. For leaf functions, this will be unit testing, for "node" function this can be considered "integration testing" if you like, but it's really the same activity.
I think you will find that the same methodologies essentially applies to functional programming, with essentially the same results; in particular, (re)designing code to be easy to test also improves its modularity and maintainability.

designing code to be easy to test also improves its modularity and maintainability.
Not true. Designing code to be easy to test improves the ease of writing test code.
Natural divisions of functional modules usually aren't the most convenient for unit testing. That's why so much Java code these days is broken into tiny, scattered pieces, with each piece mostly not useful by itself.

Related

Immutable Data Structure - Application maintenance

I have been reading about Immutable data structures and understood that change detection has been made easy . And quite often, I hear that it makes the application maintenance simpler and provides an easy to understand programming model.
I need help to understand the way it simplifies the job.
The Clojure community has embraced immutability and it is an eye opener. The best I can do is send you to the source: Rich Hickey's essay on State and his talk The Value of Values. Rich explains how separating the concept of a variable into three distinct concepts: identity, state, and value helps you model your system and reason about it.
It boils down to this: in your programming model, you should only allow things to change if they change in the system you are trying to model. Otherwise you are adding moving parts (mutable variables and objects) to a model that doesn't need them. This makes it harder to understand the model (specially as time evolves) but has little or no benefit.
Even though reading helps, the only way to grok this is to program in a language that takes immutability as a default until you realize how most of the systems you model actually have only a handful of things that change instead of pages and pages of mutable variables.
Immutability is certainly more embraced in functional languages than in imperative ones, even if you can have a Java programming style that limits mutability (see this for immutability in Java). That said, I will just comment on [functional/immutability] and [object/mutability].
I'm Clojure fan and find functional programming really powerful, but...
May be I spent too much time with C++ & Java and not enough with Lisp & Clojure, but I reckon that the simpler maintenance argument has yet to be proven by facts. I'm not sure there are reliable surveys on the actual cost of maintenance in big production systems with data on the technology used and associated costs.
Certainly, in terms of LOC, language like Clojure are really more focused and concise than Java. Hence you can say that less code leads to less maintenance, but I think functional style gives really more compact code that needs a very focused attention to fully understand what a function is doing comparing to imperative style which is more verbose but kind of straightforward. One big advantage of functional programming associated with immutability, is the ability to isolate a function and experiment with it without the need to drag a heavy context of satellite objects or build a bunch of mocks, which is very often the case with OO languages. Putting aside the experimentation, a pure function won't modify its arguments, which ease the fear to break unintentionally some piece of code outside the scope of the function.
But, putting aside the merits of functional/immutability over oop/mutability, in terms of maintenance, my experience leads me to think that it's not the technology which is the main issue, but the design, code quality and evolution of this code over time even when the initial one was of good quality. By "good", I mean that the code is respectful of style conventions (like basic naming), managed complexity, and has a sensible test harness, in a continuous (or at least automated) build environment.
Then, the question becomes: is there a paradigm (functional/immutability, object-oriented/mutability) that enforced a better design and better code. My feeling is that functional languages are the land of computer science passionates, OTOH OOP is more mainstream. Isn't it because OOP is easier to apprehend or is ot just a matter of education? but then, in order to maintain a system in the long run, should one go for a "clever" functional environment with few people able to tackle it, or some mainstream OO technology - with its unsafeness or permissiveness - but lots of people having some knowledge in it?
Certainly the solution is to choose the right technologies (plural) with the right, motivated people...

Which techniques for unit tests with poor functional requirements and no design specifications?

In my understanding, design specifications help to formulate unit test cases which make use of internal knowledge (white-box techniques), while if we only have functional requirements, the black-box techniques are more suited.
What happens if we don't have design specifications, and the requirements are often vague or do not have defined boundaries? How will it affect the unit test process? And how do you compensate for it? Do you use your experience, or a specific practice/technique, to fill in the gaps?
Would developing first the functionality be better suited? As you keep developing and gain more knowledge of how it works internally, you can at some point use either white or black-box techniques. You use your previous experience on similar situations to complete the functionality, which means you may have as well written the functional requirement yourself. At this point, do you go for white-box? black-box? Or it depends on the risk/importance of the new functionality?
I'll guess that you're talking about TDD
Would developing first the functionality be better suited? As you keep developing and gain more knowledge of how it works internally, you can at some point use either white or black-box techniques.
or it'll be difficult to create unit tests for functionality that is missing. If we assume that their main idea is to exercise the
the smallest testable part of an application
and mainly are created by the developers. If the case is
don't have design specifications, and the requirements are often vague or do not have defined boundaries
than the code design and implementation are very likely to be with the same characteristics. And the unit tests wont be so much of a help. IMHO it's crucial to know what are building before testing it, even for TDD.

How do you design complex systems with TDD?

Similar to Does TDD mean not thinking about class design?, I am having trouble thinking about where the traditional 'design' stage fits into TDD.
According to the Bowling Game Kata (the 'conversation' version, whose link escapes me at the moment) TDD appears to ignore design decisions made early on (discard the frame object, roll object, etc). I can see in that example it being a good idea to follow the tests and ignore your initial design thoughts, but in bigger projects or ones where you want to leave an opening for expansion / customisation, wouldn't it be better to put things in that you don't have a test for or don't have a need for immediately in order to avoid time-consuming rewrites later?
In short - how much design is too much when doing TDD, and how much should I be following that design as I write tests and the code to pass them (ignoring my design to only worry about passing tests)?
Or am I worrying about nothing, and code written simply to follow tests is not (in practice) difficult to rewrite or refactor if you're painted into a corner?
Alternatively, am I missing the point and that I should be expecting to rewrite portions of the code when I come to test a new section of functionality?
I would base your tests on your initial design. In many ways TDD is a discovery process. You can expect to either confirm your early design choices or find that there are better choices you can make. Do as much upfront design as you are comfortable with. Some like to fly by the seat of the chairs doing high level design and using TDD to flesh the design out. While others like to have everything on paper first.
Part of TDD is refactoring.
There is something to be said about 'Designing Big Complex Systems' that should not be associated with TDD - especially when TDD is interpreted as 'Test Driven Design' and not 'Test Driven Development'.
In the context 'Development', using TDD will ensure you are writing testable code which give all the benefits cited about TDD ( detect bugs early, high code:test coverage ratio, easier future refactoring etc. etc.)
But in 'Designing' large complex systems, TDD does not particularly address the following concerns that are inherent in the architecture of the system
(Engineering for) Performance
Security
Scalability
Availability
(and all other 'ilities')
(i.e. all of the concerns above do not magically 'emerge' through the "write a failing test case first, followed by the working implementation, Refactor - lather, rinse, repeat..." recipe).
For these, you will need to approach the problem by white-boarding the high-level and then low-level details of a system with respect to the constraints imposed by the requirements and the problem space.
Some of the above considerations compete with each other and require careful trade-offs that just don't 'emerge' through writing lots of unit tests.
Once key components and their responsibilities are defined and
understood, TDD can be used in the implementation of these
components. The process of Refactoring and continually
reviewing/improving your code will ensure the low-level design
details of these components are well-crafted.
I am yet to come across a significantly complex piece of software (e.g. compiler, database, operating system) that was done in a Test Driven Design style. The following blog article talks about this point extremely well (Compilers, TDD, Mastery)
Also, check the following videos on Architecture which adds a lot of common sense to the thought process.
Start with a rough design idea, pick a first test and start coding, going green test after test, letting the design emerge, similar or not to the initial design. How much initial design depends on the problem complexity.
One must be attentive and listen to and sniff the code, to detect refactoring opportunities and code smells.
Strictly following TDD and the SOLID principles will bring code clean, testable and flexible, so that it can be easily refactored, leveraging on the unit tests as scaffolding to prevent regression.
I've found three ways of doing design with TDD:
Allow the design to emerge naturally as duplication and complexity is removed
Create a perfect design up-front, using mocks combined with the single responsibility principle
Be pragmatic about it.
Pragmatism seems to be the best choice most times, so here's what I do. If I know that a particular pattern will suit my problem very well (for instance, MVC) I'll go straight for the mocks and assume it works. Otherwise, if the design is less clear, I'll allow it to emerge.
The cross-over point at which I feel the need to refactor an emergent design is the point at which it stops being easy to change. If a piece of code isn't perfectly designed, but another dev coming across it could easily refactor it themselves, it's good enough. If the code is becoming so complex that it stops being obvious to another dev, it's time to refactor it.
I like Real Options, and refactoring something to perfection feels to me like committing to the design without any real need to do so. I refactor to "good enough" instead; that way if my design proves itself to be wrong I've not wasted the time. Assume that your design will be wrong if you've never used it before in a similar context.
This also lets me get my code out much more quickly than if it were perfect. Having said that, it was my attempts to make the code perfect that taught me where the line was!

How does TDD compare with Functional Programming Languages?

How does TDD compare with Functional Programming Languages like F# and Erlang?
I haven't actually worked directly with a functional programming language yet, but from what I've seen of it, you have two sides of an equation and they have to balance like in algebra or accounting; this seems somewhat reminiscent of TDD where you define your expected outputs as Assert statements (one side of the equation) and the rest of the functionality goes into a class decoupled from the test (the other side of the equation), except that functional programming IMHO seems a bit cleaner.
Do the two actually have similarities, or am I just overthinking this a bit?
Software Design v Development Methodology
They're orthogonal.
TDD is an approach to developing software which focuses on ensuring correctness by developing tests against specifications before the production code is written. Functional programming is a paradigm for designing and implementing software.
I think TDD and functional programming (FP) are different in that TDD is a methodology and FP is programming paradigm.
I would say that FP help when practicing TDD as FP encourages you to make things deterministic when possible. Deterministic functions are much easier to test than non-deterministic ones.
Chris is correct in saying that they are orthogonal. However, there are some aspects of functional programming that make testing of functional programs a lot easier.
Functional programs are composed from functions and guarantee that the function will behave the same in all contexts. This means that when you test a function in unit test, you know that it will always work this way. You don't have to test whether it works if you plug it into some other environment.
Functions take arguments and return results and that's the only thing they do. This means that you can usually avoid mocking and similar tricks, because you don't need to verify whether a function does some call on some object. You only need to verify that it returns the expected result for given arguments.
Finally, there are some nice automatic tools for testing functional programs. For F#, we have FsCheck (which is based on QuickCheck known from Haskell). These benefit from various properties of functional programs.
So, they both have different purpose and are essentially a different thing, but there are some nice relations (perhaps like a tea and a teapot :-) they are completely different things, but work very well together!)
You're correct that when writing a functional program you might use equational reasoning to derive the definition of a function. However, that reasoning typically doesn't exist in some reified form (such as tests), so it is not generally the case that a function is proven correct in a way that is machine- or human-checkable. It is certainly possible to use TDD with functional languages (e.g. using any .NET compatible TDD library with F#) to verify that functions have been derived correctly, but there are also other testing strategies which might be more unique to functional languages, such as using QuickCheck for randomized specification checking.
I think the similar feel between the two stems from the fact that, with both, functions are supposed to be deterministic. FP functions shouldn't have side effects and side effects in test functions for object orientated code should be removed by injecting stubs.

For C/C++, When is it beneficial not to use Object Oriented Programming?

I find myself always trying to fit everything into the OOP methodology, when I'm coding in C/C++. But I realize that I don't always have to force everything into this mold. What are some pros/cons for using the OOP methodology versus not? I'm more interested in the pros/cons of NOT using OOP (for example, are there optimization benefits to not using OOP?). Thanks, let me know.
Of course it's very easy to explain a million reasons why OOP is a good thing. These include: design patterns, abstraction, encapsulation, modularity, polymorphism, and inheritance.
When not to use OOP:
Putting square pegs in round holes: Don't wrap everything in classes when they don't need to be. Sometimes there is no need and the extra overhead just makes your code slower and more complex.
Object state can get very complex: There is a really good quote from Joe Armstrong who invented Erlang:
The problem with object-oriented
languages is they’ve got all this
implicit environment that they carry
around with them. You wanted a banana
but what you got was a gorilla holding
the banana and the entire jungle.
Your code is already not OOP: It's not worth porting your code if your old code is not OOP. There is a quote from Richard Stallman in 1995
Adding OOP to Emacs is not clearly an
improvement; I used OOP when working
on the Lisp Machine window systems,
and I disagree with the usual view
that it is a superior way to program.
Portability with C: You may need to export a set of functions to C. Although you can simulate OOP in C by making a struct and a set of functions who's first parameter takes a pointer to that struct, it isn't always natural.
You may find more reasons in this paper entitled Bad Engineering Properties
of Object-Oriented Languages.
Wikipedia's Object Oriented Programming page also discusses some pros and cons.
One school of thought with object-oriented programming is that you should have all of the functions that operate on a class as methods on the class.
Scott Meyers, one of the C++ gurus, actually argues against this in this article:
How Non-Member Functions Improve Encapsulation.
He basically says, unless there's a real compelling reason to, you should keep the function SEPARATE from the class. Otherwise the class can turn into this big bloated unmanageable mess.
Based on experiences in a previous large project, I totally agree with him.
A benefit of non-oop functionality is that it often makes exporting your functionality to different languages easier. For example a simple DLL containing only functions is much easier to use in C#, you can use the P/Invoke to simply call the C++ functions. So in this sense it can be useful for writing extremely time critical algorithms that fit nicely into single/few function calls.
OOP is used a lot in GUI code, computer games, and simulations. Windows should be polymorphic - you can click on them, resize them, and so on. Computer game objects should be polymorphic - they probably have a location, a path to follow, they might have health, and they might have some AI behavior. Simulation objects also have behavior that is similar, but breaks down into classes.
For most things though, OOP is a bit of a waste of time. State usually just causes trouble, unless you have put it safely in the database where it belongs.
I suggest you read Bjarne's Paper about Why C++ is not just an Object-Oriented Programming Language
If we consider, for a moment, not object-orienatation itself but one
of the keystones of object-orientation: encapsulation.
It can be shown that change-propagation probability cannot increase
with distance from the change: if A depends on B and B depends on C,
and we change C, then the probability that A will change
cannot be larger than the proabability that B will
change. If B is a direct dependency on C and A is an indirect
dependency on C, then, more generally, to minimise the potential cost
of any change in a system we must miminimise the potential number of
direct dependencies.
The ISO defines encapsulation as the property that the information
contained in an object is accessible only through interactions at the
interfaces supported by the object.
We use encapsulation to minimise the number of potential dependencies
with the highest change-propagation probability. Basically,
encapsulation mitigates the ripple effect.
Thus one reason not to use encapsulation is when the system is so
small or so unchanging that the cost of potential ripple effects is
negligible. This is also, therefore, a case when OO might not be used
without potentially costly consequences.
Well, there are several alternatives. Non-OOP code in C++ may instead be:
C-style procedural code, or
C++-style generic programming
The only advantages to the first are the simplicity and backwards-compatibility. If you're writing a small trivial app, then messing around with classes is just a waste of time. If you're trying to write a "Hello World", just call printf already. Don't bother wrapping it in a class. And if you're working with an existing C codebase, it's probably not object-oriented, and trying to force it into a different paradigm than it already uses is just a recipe for pain.
For the latter, the situation is different, in that this approach is often superior to "traditional OOP".
Generic programming gives you greater performance (among other things because you often avoid the overhead of vtables, and because with less indirection, the compiler is better able to inline), better type safety (because the exact type is known, rather than hiding it behind an interface), and often cleaner and more concise code as well (STL iterators and algorithms enable much of this, without using a single instance of runtime polymorphism or virtual functions.
OOP is little more than an aging buzzword. A methodology that everyone misunderstood (The version supported by C++ and Java has little to do with what OOP originally meant, as in SmallTalk), and then pretended was the holy grail. There are aspects to it that are useful, certainly, but it is often not the best approach for designing an application.
Rather, express the overall logic by other means, for example generic programming, and when you need a class to encapsulate some simple concept, by all means design it according to OOP principles.
OOP is just a tool among many. The goal is not to write OOP code, but to write good code. Sometimes, the way to do this is by using OOP principles, but often, you can get better code using generic programmming principles, or functional programming.
It is a very project dependent decision. My general feel of OOP is that its useful for organizing large projects that involve multiple components. One area I find that OOP is especially pointless is school assignments. Excepting those specifically designed to teach OOP concepts, or large software design concepts, many of my assignments, specifically those in more algorithmy type classes are best suited to non-OOP design.
So specifically, smaller projects, that are not likely to grow large, and projects that center around a single algorithm seem to be non-OOP candidates in my books. Also, if you can write the specification as a linear set of steps, e.g., with no interactive GUI or state to maintain, this would also be an opportunity.
Of course, if you're required to use an OOP design, or an OOP toolkit, or if you have well defined 'objects' in you're spec, or if you need the features of polymorphism, etc. etc. etc...there are plenty of reasons to use it, the above seem to be indicators of when it would be simple not to.
Just my $0.02.
Having an Ada background, I develop in C in terms of packages containing data and their associated functions. This gives a code very modular with pieces of code that can be taken apart and reused on other projects. I don't feel the need to use OOP.
When I develop in Objective-C, objects are the natural container for data and code. I still develop with more or less the package concept in mind with some new cool features.
I'm used to be an OOP fanboy... Then realized using functions, generics and callbacks can often make a more elegant and change-friendly solution in C++ than classes and virtual functions.
Other big names realized it too: http://harmful.cat-v.org/software/OO_programming/
IMHO, I have a feeling that the OOP concept is not really suits the needs of the Big Data, as OOP assume all the stuff to be kept in memory (concept of Objects and member variables). This always result in memory demanding and heavy applications when OOP is used for example for big images processing. Instead, the simplicity of C maybe used with intensive parallel I/O making apps more efficient and easy to implement. It is the year 2019 I am writing this message...Everything may change in a year! :)
In my mind it comes down to what kind of model suits the problem at hand. It seems to me that OOP is best suited to coding GUI programs, in that the data and functionality for a graphical object is easily bundled together. Other problems- (such as a webserver, as an example off the top of my head), might be more easily modeled with a data centric approach, where there's no strong advantage to having a method and its data near each-other.
tl;dr depends on the problem.
I'd say the greatest benefit of C++ OOP is inheritance and polymorphism (Virtual function etc...) .
This allows for code reuse and extendibility
C++, use OOP - - - C, no, with certain exceptions
In C++ you should use OOP. It's a nice abstraction and it's the tool you are given. You either use it or leave it in the box where it can't help. You don't use the power saw for everything but I would read the manual and have it ready for the right job.
In C, it's a more difficult call. While you can certainly write arbitrarily object-oriented code in C, it's enough of a pain that you immediately find yourself fighting the language in order to use it. You may be more productive dropping the doesn't-fit-so-well design pattern and programming as C was intended to be used.
Furthermore, every time you make an array of function pointers or something in an OOP-in-C design pattern, you sever almost completely all visible links in the inheritance chain, making the code hard to maintain. In real OOP languages, there is an obvious chain of derived classes, often analyzed and documented for you. (mmm, javadoc.) Not so in OOP-in-C, and the tools available won't be able to see it.
So, I would argue in general against OOP in C. For a really complex program, you may well need the abstraction, and then you will have to do it despite needing to fight the language in the process and despite making the program quite hard to follow by anyone other than the original author.
But if you knew the program was going to become that complicated, you shouldn't have written it in C in the first place...
In C, there are some times when I 'emulate' the object oriented approach, by defining some sort of constructor with granular control over things like callbacks, when running several instances of it.
For instance, lets say I have some spiffy event handler library and I know that down the road I'm going to need many allocated copies:
So I would have (in C)
MyEvent *ev1 = new_eventhandler();
set_event_callback_func(ev1, callback_one);
ev1->setfd(fd1);
MyEvent *ev2 = new_eventhandler();
set_event_callback_func(ev2, callback_two);
ev2->setfd(fd2);
destroy_eventhandler(ev1);
destroy_eventhandler(ev2);
Obviously, I would later do something useful with that like handle received events in the two respective callback functions. I'm not going to really elaborate on the method of typing function pointers and structures to hold them, nor what would go on in the 'constructor' because its pretty obvious.
I think, this approach works for more advanced interfaces where its desirable to allow the user to define their own callbacks (and change them on the fly), or when working on complex non-blocking I/O services.
Otherwise, I much prefer a more procedural / functional approach.
Probably an unpopular idea but I think you should stick with non-OOP unless it adds something useful. In most practical problems OOP is useful but if I'm just playing with an idea I start writing non-object code and put functions and data into classes if it becomes useful.
Of course I still use other objects in my code (std::vector et al) and I use namespaces to help organise my functions but why put code into objects until it is useful? Equally don't shy away from free functions in an OO solution.
The question is tricky because OOP encompasses several concepts: object encapsulation, polymorphism, inheritance, etc. It's easy to take those ideas too far. Here's a concrete example:
When C++ first caught on, zillions of string classes sprung into being. Everything you could possibly imagine doing to a string (upcasing, downcasing, trimming, tokenizing, parsing, etc.) was a member function of some string class.
Notice, though, that std::strings from the STL don't have all these methods. STL is object-oriented--the state and implementation details of a string object are well encapsulated, only a small, orthogonal interface is exposed to the world. All the crazy manipulations that people used to include as member functions are now delegated to non-member functions.
This is powerful, because these functions can now work on any string class that exposes the same interface. If you use STL strings for most things and a specialty version tuned to your program's idiosyncracies, you don't have to duplicate member functions. You just have to implement the basic string interface and then you can re-use all those crazy manipulations.
Some people call this hybrid approach generic programming. It's still object-oriented programming, but it moves away from the "everything is a member-function" mentality that a lot of people associate with OOP.