unit testing accuracy of function composition - unit-testing

I'm writing tests for an object that takes in an input, composes some functions together, runs the input through the composed function, and returns the result.
Here's a greatly-simplified set of objects and functions that mirrors my design:
type Result =
| Success of string
let internal add5 x = x + 5
let internal mapResult number =
Success (number.ToString())
type public InteropGuy internal (add, map) =
member this.Add5AndMap number =
number |> (add >> map)
type InteropGuyFactory() =
member this.CreateInteropGuy () =
new InteropGuy(add5, mapResult)
The class is designed to be used for C# interop which explains the structure, but this problem still can apply to any function under test that composes function parameters.
I'm having trouble finding an elegant way to keep the implementation details of the internal functions from creeping in to the test conditions when testing the composing function, or in other words, isolating one link in the chain instead of inspecting the output once the input is piped entirely through. If I simply inspect the output then tests for each function are going to be dependent on downstream functions working properly, and if the one at the end of the chain stops working, all of the tests will fail. The best I've been able to do is stub out a function to return a certain value, then stub out its downstream function, storing the input of the downstream function and then asserting the stored value is equal to the output of the stubbed function:
[<TestClass>]
type InteropGuyTests() =
[<TestMethod>]
member this.``Add5AndMap passes add5 result into map function``() =
let add5 _ = 13
let tempResult = ref 0
let mapResult result =
tempResult := result
Success "unused result"
let guy = new InteropGuy(add5, mapResult)
guy.Add5AndMap 8 |> ignore
Assert.AreEqual(13, !tempResult)
Is there a better way to do this or is this generally how to test composition in isolation? Design comments also appreciated.

The first question we should ask when encountering something like this is: why do we want to test this piece of code?
When the potential System Under Test (SUT) is literally a single statement, then which value does the test add?
AFAICT, there's only two ways to test a one-liner.
Triangulation
Duplication of implementation
Both are possible, but comes with drawbacks, so I think it's worth asking if such a method/function should be tested at all.
Still, assuming that you want to test the function, e.g. to prevent regressions, you can use either of these options.
Triangulation
With triangulation, you simply throw enough example values at the SUT to demonstrate that it works as the black box it's supposed to be:
open Xunit
open Swensen.Unquote
[<Theory>]
[<InlineData(0, "5")>]
[<InlineData(1, "6")>]
[<InlineData(42, "47")>]
[<InlineData(1337, "1342")>]
let ``Add5AndMap returns expected result`` (number : int, expected : string) =
let actual = InteropGuyFactory().CreateInteropGuy().Add5AndMap number
Success expected =! actual
The advantage of this example is that it treats the SUT as a black box, but the disadvantage is that it doesn't demonstrate that the SUT is a result of any particular composition.
Duplication of implementation
You can use Property-Based Testing to demonstrate (or, at least make very likely) that the SUT is composed of the desired functions, but it requires duplicating the implementation.
Since the functions are assumed to be referentially transparent, you can simply throw enough example values at both the composition and the SUT, and verify that they return the same value:
open FsCheck.Xunit
open Swensen.Unquote
[<Property>]
let ``Add5AndMap returns composed result`` (number : int) =
let actual = InteropGuyFactory().CreateInteropGuy().Add5AndMap number
let expected = number |> add5 |> mapResult
expected =! actual
Is it ever interesting to duplicate the implementation in the test?
Often, it's not, but if the purpose of the test is to prevent regressions, it may be worthwhile as a sort of double-entry bookkeeping.

Related

JUnit avoid duplicate assertions

I am writing simple test cases for converting Entity to DTO and vice versa. The question is more about design. Is it acceptable to leave duplicates like in code below or is it better to create external method for this assert? Since I'm Java newbie can someone give me hint about hmm? any Generic method? I don't want to use and inheritance or any other abstraction for such simple Entity and its DTO because there will be much more code than just few duplicate line of code.
Here is how it looks like now:
#Test
void addressToAddressDTO() {
Address address = getAddress();
AddressDTO addressDTO = addressMapper.addressToAddressDTO(address);
assertAll("Check if values were properly bound",
() -> {
assertEquals(address.getCity(), addressDTO.getCity());
assertEquals(address.getUserDetails().getFirstName(), addressDTO.getUserDetails().getFirstName());
assertEquals(address.getUserDetails().getUser().getUsername(), addressDTO.getUserDetails().getUser().getUsername());
assertEquals(address.getUserDetails().getContact().getEmail(), addressDTO.getUserDetails().getContact().getEmail());
assertEquals(address.getUserDetails().getProfileImage().getImageUrl(), addressDTO.getUserDetails().getProfileImage().getImageUrl());
});
}
#Test
void addressDTOtoAddress() {
AddressDTO addressDTO = getAddressDTO();
Address address = addressMapper.addressDTOtoAddress(addressDTO);
assertAll("Check if values were properly bound",
() -> {
assertEquals(addressDTO.getCity(), address.getCity());
assertEquals(addressDTO.getUserDetails().getFirstName(), address.getUserDetails().getFirstName());
assertEquals(addressDTO.getUserDetails().getUser().getUsername(), address.getUserDetails().getUser().getUsername());
assertEquals(addressDTO.getUserDetails().getContact().getEmail(), address.getUserDetails().getContact().getEmail());
assertEquals(addressDTO.getUserDetails().getProfileImage().getImageUrl(), address.getUserDetails().getProfileImage().getImageUrl());
});
}
My idea was to create something more generic like:
private<T, S> void assertObject(T expected, S actual) {
assertAll("Check if values were properly bound",
() -> {
assertEquals(expected.getCity(), actual.getCity());
assertEquals(expected.getUserDetails().getFirstName(), actual.getUserDetails().getFirstName());
assertEquals(expected.getUserDetails().getUser().getUsername(), actual.getUserDetails().getUser().getUsername());
assertEquals(expected.getUserDetails().getContact().getEmail(), actual.getUserDetails().getContact().getEmail());
assertEquals(expected.getUserDetails().getProfileImage().getImageUrl(), actual.getUserDetails().getProfileImage().getImageUrl());
});
}
but since even they are the same objects they have nothing in common. How to achieve something hmm interchangable that Address and AddressDTO can be both actual or expected?
EDIT
According to Aaron Digulla answer I've made some changes, hope it will help someone with the same doubts.If someone know any other option please post in in the comment section.
#Test
void addressToAddressDTO() {
Address expected = getAddress();
AddressDTO actual = addressMapper.addressToAddressDTO(expected);
assertEquals(
mergeAddressDataToString(expected),
actual.getCity() + "," +
actual.getUserDetails().getFirstName() + "," +
actual.getUserDetails().getUser().getUsername() + "," +
actual.getUserDetails().getContact().getEmail() + "," +
actual.getUserDetails().getProfileImage().getImageUrl()
);
}
#Test
void addressDTOtoAddress() {
AddressDTO expected = getAddressDTO();
Address actual = addressMapper.addressDTOtoAddress(expected);
assertEquals(
expected.getCity() + "," +
expected.getUserDetails().getFirstName() + "," +
expected.getUserDetails().getUser().getUsername() + "," +
expected.getUserDetails().getContact().getEmail() + "," +
expected.getUserDetails().getProfileImage().getImageUrl(),
mergeAddressDataToString(actual)
);
}
private String mergeAddressDataToString(Address address) {
StringJoiner stringJoiner = new StringJoiner(",");
stringJoiner.add(address.getCity());
stringJoiner.add(address.getUserDetails().getFirstName());
stringJoiner.add(address.getUserDetails().getUser().getUsername());
stringJoiner.add(address.getUserDetails().getContact().getEmail());
stringJoiner.add(address.getUserDetails().getProfileImage().getImageUrl());
return stringJoiner.toString();
}
I usually write a custom toString() method for the object in the test or a test utility class when several tests need to share it. That way, I can check all values with a single assertEquals().
I can use newlines to make the assert more readable in an IDE:
assertEquals(
"city=Foo\n" +
"firstName=John\n" +
"user=doe\n" +
....
, toString(actual));
which also works nicely when you have to check a list of values:
...
, list.stream().map(this::toString).collect(Collectors.joining("\n---\n"));
The big advantage here is that you get all mismatches at once. You can also tweak the toString() method to handle corner cases (like comparing only the number of elements for huge lists or rounding decimals).
It also makes the code easy to understand. In many cases, it also saves you the time to write the code to fill all those expected objects which you would need. Tests can even share expected strings when several tests should yield the same result.
When the test breaks because the output has changed, I can just select the whole string and replace it with the new output. The IDE will do the formatting for me.
Your initial approach, aside of the duplication, uses one antipattern: You call assertAll, but still put all assertions into one block. Thus, after the first failing assertion, the block's execution will be terminated. If instead you put each individual check in one Executable, in case of a failure, all checks will be performed and you will get more details about what failed and what not. Certainly, this is no longer the problem with the string comparison approach.
Regarding the duplication, there is another idea how you could avoid it in this particular case, that is, for the tests of the two conversion functions: You could make use of the fact that a conversion back and forth between the two types is the identity function:
Address address = getAddress();
AddressDTO addressDTO = addressMapper.addressToAddressDTO(address);
Address actual = addressMapper.addressDTOtoAddress(addressDTO);
assertEquals(address, actual);
This eliminates the comparison of individual elements. It may even be advantageous if the representation of an entity and the DTO change in a way that the attributes are no longer strictly equal, but instead only have to be convertible back and forth.
But, each test now also relies on other methods of the class under test. Which is not a problem in general: Many tests, for example, rely on the constructor to work - which is OK if there are also tests for the constructor. Here, however, if the test fails, there are two possible locations that are responsible, and so finding the culprit requires more analysis.
Regarding the second option you have tried, namely creating strings and comparing them: In some scenarios such an approach may be helpful, but generally I am hesitant about going from structured data to strings. Assume you create a lot of tests using that pattern. Later, however, you realize that in the DTO some of the attributes have to be encoded differently than in the entity (I mentioned that scenario above). Suddenly, the conversion to strings does not work any more or becomes awkward.
With or without the strings approach: If you have more assertions where complete entity objects and DTOs have to be compared, I'd recommend writing your own assertion helper methods, like, assertEntityMatchesDto and assertDtoMatchesEntity.
One final remark: There may be reasons not to compare all attributes of the objects in one test: This way, the tests may not be focused enough. Taking the example about encoding changes again: Imagine that you will have to change the representation of the email address in the DTO at some point in time. Then, if you always look at all attributes in your tests, all your tests will fail. If, instead, you have focused tests for individual attributes, such a change will - if done correctly - only affect tests focusing on the email attribute, but leave other tests intact.

Should function composition and piping be tested?

In F# (and most of the functional languages) some codes are extremely short as follows:
let f = getNames
>> Observable.flatmap ObservableJson.jsonArrayToObservableObjects<string>
or :
let jsonArrayToObservableObjects<'t> =
JsonConvert.DeserializeObject<'t[]>
>> Observable.ToObservable
And the simplest property-based test I ended up for the latter function is :
testList "ObservableJson" [
testProperty "Should convert an Observable of `json` array to Observable of single F# objects" <| fun _ ->
//--Arrange--
let (array , json) = createAJsonArrayOfString stringArray
//--Act--
let actual = jsonArray
|> ObservableJson.jsonArrayToObservableObjects<string>
|> Observable.ToArray
|> Observable.Wait
//--Assert--
Expect.sequenceEqual actual sArray
]
Regardless of the arrange part, the test is more than the function under test, so it's harder to read than the function under test!
What would be the value of testing when it's harder to read than the production code?
On the other hand:
I wonder whether the functions which are a composition of multiple functions are safe to not to be tested?
Should they be tested at integration and acceptance level?
And what if they are short but do complex operations?
Depends upon your definition of what 'functional programming' is. Or even more precise - upon how close you wanna stay to the origin of functional programming - math with both broad and narrow meanings.
Let's take something relevant to the programming. Say, mappings theory. Your question could be translated in such a way: having a bijection from A to B, and a bijection from B to C, should I prove that composition of those two is a bijection as well? The answer is twofold: you definitely should, and you do it only once: your prove is generic enough to cover all possible cases.
Falling back into programming, it means that pipe-lining has to be tested (proved) only once - and I guess it was before deploy to the production. Since that your job as a programmer to create functions (mappings) of such a quality, that, being composed with a pipeline operator or whatever else, the desired properties are preserved. Once again, it's better to stick with generic arguments rather than write tons of similar tests.
So, finally, we come down to a much more valuable question: how one can guarantee that some operation preserve some property? It turns out that the easiest way to acknowledge such a fact is to deal with types like Monoid from the great Haskell: for example, Monoind is there to represent any associative binary operation A -> A -> A together with some identity-element of type A. Having such a generic containers is extremely profitable and the best known way of being explicit in what and how exactly your code is designed to do.
Personally I would NOT test it.
In fact having less need for testing and instead relying more on stricter compiler rules, side effect free functions, immutability etc. is one major reason why I prefer F# over C#.
of course I continue (unit)testing of "custom logic" ... e.g. algorithmic code

How to detect list changes without comparing the complete list

I have a function which will fail if there has being any change on the term/list it is using since the generation of this term/list. I would like to avoid to check that each parameter still the same. So I had thought about each time I generate the term/list to perform a CRC or something similar. Before making use of it I would generate again the CRC so I can be 99,9999% sure the term/list still the same.
Going to a specfic answer, I am programming in Erlang, I am thinking on using a function of the following type:
-spec(list_crc32(List :: [term()]) -> CRC32 :: integer()).
I use term, because it is a list of terms, (erlang has already a default fast CRC libraries but for binary values). I have consider to use "erlang:crc32(term_to_binary(Term))", but not sure if there could be a better approach.
What do you think?
Regards, Borja.
Without more context it is a little bit difficult to understand why you would have this problem, particularly since Erlang terms are immutable -- once assigned no other operation can change the value of a variable, not even in the same function.
So if your question is "How do I quickly assert that true = A == A?" then consider this code:
A = generate_list()
% other things in this function happen
A = A.
The above snippet will always assert that A is still A, because it is not possible to change A like you might do in, say, Python.
If your question is "How do I assert that the value of a new list generated exactly the same value as a different known list?" then using either matching or an actual assertion is the fastest way:
start() ->
A = generate_list(),
assert_loop(A).
assert_loop(A) ->
ok = do_stuff(),
A = generate_list(),
assert_loop(A).
The assert_loop/1 function above is forcing an assertion that the output of generate_list/0 is still exactly A. There is no telling what other things in the system might be happening which may have affected the result of that function, but the line A = generate_list() will crash if the list returned is not exactly the same value as A.
In fact, there is no way to change the A in this example, no matter how many times we execute assert_loop/1 above.
Now consider a different style:
compare_loop(A) ->
ok = do_stuff(),
case A =:= generate_list() of
true -> compare_loop(A);
false -> terminate_gracefully()
end.
Here we have given ourselves the option to do something other than crash, but the effect is ultimately the same, as the =:= is not merely a test of equality, it is a match test meaning that the two do not evaluate to the same values, but that they actually match.
Consider:
1> 1 == 1.0.
true
2> 1 =:= 1.0.
false
The fastest way to compare two terms will depend partly on the sizes of the lists involved but especially on whether or not you expect the assertion to pass or fail more often.
If the check is expected to fail more often then the fastest check is to use an assertion with =, an equivalence test with == or a match test with =:= instead of using erlang:phash2/1. Why? Because these tests can return false as soon as a non-matching element is encountered -- and if this non-match occurs near the beginning of the list then a full traverse of both lists is avoided entirely.
If the check is expected to pass more often then something like erlang:phash2/1 will be faster, but only if the lists are long, because only one list will be fully traversed each iteration (the hash of the original list is already stored). It is possible, though, on a short list that a simple comparison will still be faster than computing a hash, storing it, computing another hash, and then comparing the hashes (obviously). So, as always, benchmark.
A phash2 version could look like:
start() ->
A = generate_list(),
Hash = erlang:phash2(A),
assert_loop(Hash).
assert_loop(Hash) ->
ok = do_stuff(),
Hash = erlang:phash2(generate_list()),
loop(Hash).
Again, this is an assertive loop that will crash instead of exit cleanly, so it would need to be adapted to your needs.
The basic mystery still remains, though: in a language with immutable variables why is it that you don't know whether something will have changed? This is almost certainly a symptom of an underlying architectural problem elsewhere in the program -- either that or simply a misunderstanding of immutability in Erlang.

Does property based testing make you duplicate code?

I'm trying to replace some old unit tests with property based testing (PBT), concreteley with scala and scalatest - scalacheck but I think the problem is more general. The simplified situation is , if I have a method I want to test:
def upcaseReverse(s:String) = s.toUpperCase.reverse
Normally, I would have written unit tests like:
assertEquals("GNIRTS", upcaseReverse("string"))
assertEquals("", upcaseReverse(""))
// ... corner cases I could think of
So, for each test, I write the output I expect, no problem. Now, with PBT, it'd be like :
property("strings are reversed and upper-cased") {
forAll { (s: String) =>
assert ( upcaseReverse(s) == ???) //this is the problem right here!
}
}
As I try to write a test that will be true for all String inputs, I find my self having to write the logic of the method again in the tests. In this case the test would look like :
assert ( upcaseReverse(s) == s.toUpperCase.reverse)
That is, I had to write the implementation in the test to make sure the output is correct.
Is there a way out of this? Am I misunderstanding PBT, and should I be testing other properties instead, like :
"strings should have the same length as the original"
"strings should contain all the characters of the original"
"strings should not contain lower case characters"
...
That is also plausible but sounds like much contrived and less clear. Can anybody with more experience in PBT shed some light here?
EDIT : following #Eric's sources I got to this post, and there's exactly an example of what I mean (at Applying the categories one more time): to test the method times in (F#):
type Dollar(amount:int) =
member val Amount = amount
member this.Add add =
Dollar (amount + add)
member this.Times multiplier =
Dollar (amount * multiplier)
static member Create amount =
Dollar amount
the author ends up writing a test that goes like:
let ``create then times should be same as times then create`` start multiplier =
let d0 = Dollar.Create start
let d1 = d0.Times(multiplier)
let d2 = Dollar.Create (start * multiplier) // This ones duplicates the code of Times!
d1 = d2
So, in order to test that a method, the code of the method is duplicated in the test. In this case something as trivial as multiplying, but I think it extrapolates to more complex cases.
This presentation gives some clues about the kind of properties you can write for your code without duplicating it.
In general it is useful to think about what happens when you compose the method you want to test with other methods on that class:
size
++
reverse
toUpperCase
contains
For example:
upcaseReverse(y) ++ upcaseReverse(x) == upcaseReverse(x ++ y)
Then think about what would break if the implementation was broken. Would the property fail if:
size was not preserved?
not all characters were uppercased?
the string was not properly reversed?
1. is actually implied by 3. and I think that the property above would break for 3. However it would not break for 2 (if there was no uppercasing at all for example). Can we enhance it? What about:
upcaseReverse(y) ++ x.reverse.toUpper == upcaseReverse(x ++ y)
I think this one is ok but don't believe me and run the tests!
Anyway I hope you get the idea:
compose with other methods
see if there are equalities which seem to hold (things like "round-tripping" or "idempotency" or "model-checking" in the presentation)
check if your property will break when the code is wrong
Note that 1. and 2. are implemented by a library named QuickSpec and 3. is "mutation testing".
Addendum
About your Edit: the Times operation is just a wrapper around * so there's not much to test. However in a more complex case you might want to check that the operation:
has a unit element
is associative
is commutative
is distributive with the addition
If any of these properties fails, this would be a big surprise. If you encode those properties as generic properties for any binary relation T x T -> T you should be able to reuse them very easily in all sorts of contexts (see the Scalaz Monoid "laws").
Coming back to your upperCaseReverse example I would actually write 2 separate properties:
"upperCaseReverse must uppercase the string" >> forAll { s: String =>
upperCaseReverse(s).forall(_.isUpper)
}
"upperCaseReverse reverses the string regardless of case" >> forAll { s: String =>
upperCaseReverse(s).toLowerCase === s.reverse.toLowerCase
}
This doesn't duplicate the code and states 2 different things which can break if your code is wrong.
In conclusion, I had the same question as you before and felt pretty frustrated about it but after a while I found more and more cases where I was not duplicating my code in properties, especially when I starting thinking about
combining the tested function with other functions (.isUpper in the first property)
comparing the tested function with a simpler "model" of computation ("reverse regardless of case" in the second property)
I have called this problem "convergent testing" but I can't figure out why or where there term comes from so take it with a grain of salt.
For any test you run the risk of the complexity of the test code approaching the complexity of the code under test.
In your case, the the code winds up being basically the same which is just writing the same code twice. Sometimes there is value in that. For example, if you are writing code to keep someone in intensive care alive, you could write it twice to be safe. I wouldn't fault you for the abundance of caution.
For other cases there comes a point where the likelihood of the test breaking invalidates the benefit of the test catching real issues. For that reason, even if it is against best practice in other ways (enumerating things that should be calculated, not writing DRY code) I try to write test code that is in some way simpler than the production code, so it is less likely to fail.
If I cannot find a way to write code simpler than the test code, that is also maintainable(read: "that I also like"), I move that test to a "higher" level(for example unit test -> functional test)
I just started playing with property based testing but from what I can tell it is hard to make it work with many unit tests. For complex units, it can work, but I find it more helpful at functional testing so far.
For functional testing you can often write the rule a function has to satisfy much more simply than you can write a function that satisfies the rule. This feels to me a lot like the P vs NP problem. Where you can write a program to VALIDATE a solution in linear time, but all known programs to FIND a solution take much longer. That seems like a wonderful case for property testing.

Programming without if-statements? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
I remember some time (years, probably) ago I read on Stackoverflow about the charms of programming with as few if-tests as possible. This question is somewhat relevant but I think the stress was on using many small functions that returned values determined by tests depending on the parameter they receive. A very simple example would be using this:
int i = 5;
bool iIsSmall = isSmall(i);
with isSmall() looking like this:
private bool isSmall(int number)
{
return (i < 10);
}
instead of just doing this:
int i = 5;
bool isSmall;
if (i < 10) {
isSmall = true;
} else {
isSmall = false;
}
(Logically this code is just sample code. It is not part of a program I am making.)
The reason for doing this, I believe, was because it looks nicer and makes a programmer less prone to logical errors. If this coding convention is applied correctly, you would see virtually no if-tests anywhere, except in functions whose only purpose is to do that test.
Now, my question is: is there any documentation about this convention? Is there anyplace where you can see wild arguments between supporters and opposers of this style? I tried searching for the Stackoverflow post that introduced me to this, but I can't find it anymore.
Lastly, I hope this question doesn't get shot down because I am not asking for a solution to a problem. I am simply hoping to hear more about this coding style and maybe increase the quality of all coding I will do in the future.
This whole "if" vs "no if" thing makes me think of the Expression Problem1. Basically, it's an observation that programming with if statements or without if statements is a matter of encapsulation and extensibility and that sometimes it's better to use if statements2 and sometimes it's better to use dynamic dispatching with methods / function pointers.
When we want to model something, there are two axes to worry about:
The different cases (or types) of the inputs we need to deal with.
The different operations we want to perform over these inputs.
One way to implement this sort of thing is with if statements / pattern matching / the visitor pattern:
data List = Nil | Cons Int List
length xs = case xs of
Nil -> 0
Cons a as -> 1 + length x
concat xs ys = case ii of
Nil -> jj
Cons a as -> Cons a (concat as ys)
The other way is to use object orientation:
data List = {
length :: Int
concat :: (List -> List)
}
nil = List {
length = 0,
concat = (\ys -> ys)
}
cons x xs = List {
length = 1 + length xs,
concat = (\ys -> cons x (concat xs ys))
}
It's not hard to see that the first version using if statements makes it easy to add new operations on our data type: just create a new function and do a case analysis inside it. On the other hand, this makes it hard to add new cases to our data type since that would mean going back through the program and modifying all the branching statements.
The second version is kind of the opposite. It's very easy to add new cases to the datatype: just create a new "class" and tell what to do for each of the methods we need to implement. However, it's now hard to add new operations to the interface since this means adding a new method for all the old classes that implemented the interface.
There are many different approaches that languages use to try to solve the Expression Problem and make it easy to add both new cases and new operations to a model. However, there are pros and cons to these solutions3 so in general I think it's a good rule of thumb to choose between OO and if statements depending on what axis you want to make it easier to extend stuff.
Anyway, going back to your question there are couple of things I would like to point out:
The first one is that I think the OO "mantra" of getting rid of all if statements and replacing them with method dispatching has more to do with how most OO languages don't have typesafe Algebraic Data Types than it has to do with "if statemsnts" being bad for encapsulation. Since the only way to be type safe is to use method calls you are encouraged to convert programs using if statements into programs using the Visitor Pattern4 or worse: convert programs that should be using the visitor pattern into programs using simple method dispatch, therefore making extensibility easy in the wrong direction.
The second thing is that I'm not a big fan of breaking things into functions just because you can. In particular, I find that style where all the functions have just 5 lines and call tons of other functions is pretty hard to read.
Finally, I think your example doesn't really get rid of if statements. Essentially, what you are doing is having a function from Integers to a new datatype (with two cases, one for Big and one for Small) and then you still need to use if statements when working with the datatype:
data Size = Big | Small
toSize :: Int -> Size
toSize n = if n < 10 then Small else Big
someOp :: Size -> String
someOp Small = "Wow, its small"
someOp Big = "Wow, its big"
Going back to the expression problem point of view, the advantage of defining our toSize / isSmall function is that we put the logic of choosing what case our number fits in a single place and that our functions can only operate on the case after that. However, this does not mean that we have removed if statements from our code! If we have the toSize being a factory function and we have Big and Small be classes sharing an interface then yes, we will have removed if statements from our code. However, if our isSmall just returns a boolean or enum then there will be just as many if statements as there were before. (and you should choose what implementation to use depending if you want to make it easier to add new methods or new cases - say Medium - in the future)
1 - The name of the problem comes from the problem where you have an "expression" datatype (numbers, variables, addition/multiplication of subexpressions, etc) and want to implement things like evaluation functions and other things.
2 - Or pattern matching over Algebraic Data Types, if you want to be more type safe...
3 - For example, you might have to define all multimethods on the "top level" where the "dispatcher" can see them. This is a limitation compared to the general case since you can use if statements (and lambdas) nested deeply inside other code.
4 - Essentially a "church encoding" of an algebraic data type
I've never heard of such a convection. I don't see how it works, anyway. Surely the only point of having a iIsSmall is to later branch on it (possibly in combination with other values)?
What I have heard of is an argument to avoid having variables like iIsSmall at all. iIsSmall is just storing the result of a test you made, so that you can later use that result to make some decision. So why not just test the value of i at the point where you need to make the decision? i.e., instead of:
int i = 5;
bool iIsSmall = isSmall(i);
...
<code>
...
if (iIsSmall) {
<do something because i is small>
} else {
<do something different because i is not small>
}
just write:
int i = 5
...
<code>
...
if (isSmall(i)) {
<do something because i is small>
} else {
<do something different because i is not small>
}
That way you can tell at the branch point what you're actually branching on because it's right there. That's not hard in this example anyway, but if the test was complicated you're probably not going to be able to encode the whole thing in the variable name.
It's also safer. There's no danger that the name iIsSmall is misleading because you changed the code so that it was testing something else, or because i was actually altered after you called isSmall so that it is not necessarily small anymore, or because someone just picked a dumb variable name, etc, etc.
Obviously this doesn't always work. If the isSmall test is expensive and you need to branch on its result many times, you don't want to execute it many times. You also might not want to duplicate the code of that call many times, unless it's trivial. Or you might want to return the flag to be used by a caller who doesn't know about i (though then you could just return isSmall(i), rather than store it in a variable and then return the variable).
Btw, the separate function saves nothing in your example. You can include (i < 10) in an assignment to a bool variable just as easily as in a return statement in a bool function. i.e. you could just as easily write bool isSmall = i < 10; - it's this that avoids the if statement, not the separate function. Code of the form if (test) { x = true; } else { x = false; } or if (test) { return true; } else { return false; } is always silly; just use x = test or return test.
Is it really a convention? Should one just kill minimal if-constructs just because there could be frustration over it?
OK, if statements tend to grow out of control, especially if many special cases are added over time. Branch after branch is added and at the end no one is able to comprehend what everything does without spending hours of time and some cups of coffee into this grown instance of spaghetti-code.
But is it really a good idea to put everything in seperate functions? Code should be reusable. Code should be readable. But a function call just creates the need to look it up further up in the source file. If all ifs are put away in this way, you just skip around in the source file all the time. Does this support readability?
Or consider an if-statement which is not reused anywhere. Should it really go into a separate function, just for the sake of convention? there is some overhead involved here, too. Performance issues could be relevant in this context, too.
What I am trying to say: following coding conventions is good. Style is important. But there are exceptions. Just try to write good code that fits into your project and keep the future in mind. In the end, coding conventions are just guidelines which try to help us to produce good code without enforcing anything on us.