I would like to try out test-driven development, but the project I am working on involves a lot of randomness and I am very unsure about how I can test it. Here is a toy example of the kind of algorithm I may want to write:
Write a function taking no argument and returning a list of random integers satisfying the following properties
Each integer is between 0 and 10
The same number doesn’t appear twice
The list is of length 3 90% of the time, and of length 4 10% of the time
There is a 50% chance for the number 3 to appear
I do not need to test exact statistical distribution, but obviously I would like tests that will fail if someone completely removes the corresponding code.
I am using an external RNG that you can assume is correct, and I am pretty free in how to structure the code, so I can use dependency injection to have tests use a fake RNG instead, but I still don’t really see how that would help. For instance even if I always use the same seed for the tests, as soon as I refactor the algorithm to pick random numbers in a different order, all the tests become meaningless.
I guess that the first two points could be tested by generating many cases and checking that the constraints are satisfied, but that doesn’t really feel like TDD.
For the last two points, I’m thinking of having tests with different configurations, where for instance the 90% is either 100% or 0%, and then I can test if the length of the list is indeed 3 or 4. I guess it would work, but it seems maybe a bit weak.
Is there any guidelines or other techniques to use when using TDD to test algorithms involving randomness?
There are several ways you can go about a problem like this, and I may add another answer in the future, but the approach that I immediately found most compelling would be to combine test-driven development (TDD) with property-based testing.
You can do this in many languages, with various frameworks. Here, I'm going to use the original property-based testing library, QuickCheck.
The first two requirements translate directly to predicates that QuickCheck can exercise. The latter two translates into distribution tests - a more advanced feature of QuickCheck that John Hughes explains in this presentation.
Each one in turn.
Before writing the first test, you're going to set up tests and import the appropriate libraries:
module RintsProperties where
import Test.Framework (Test)
import Test.Framework.Providers.QuickCheck2
import Test.QuickCheck
import Q72168364
where the System Under Test (SUT) is defined in the Q72168364 library. The SUT itself is an action called rints (for Random INTS):
rints :: IO [Int]
Since it's going to generate random numbers, it'll have to run in IO.
The first requirement says something about the image of the SUT. This is easily expressed as a property:
testProperty "Each integer is between 0 and 10" $ \() -> ioProperty $ do
actual <- rints
return $
counterexample ("actual: " ++ show actual) $
all (\i -> 0 <= i && i <= 10) actual
If you ignore some of the ceremony involved with producing a useful assertion message and such, the central assertion is this:
all (\i -> 0 <= i && i <= 10) actual
which verifies that all integers i in actual are between 0 and 10.
In true TDD fashion, the simplest implementation that passes the test is this:
rints :: IO [Int]
rints = return []
Always return an empty list. While degenerate, it fulfils the requirement.
No duplicates
The next requirement also translates easily to a predicate:
testProperty "The same number does not appear twice" $ \() -> ioProperty $ do
actual <- rints
return $ nub actual === actual
nub removes duplicates, so this assertion states that nub actual (actual where duplicates are removed) should be equal to actual. This is only going to be the case if there are no duplicates in actual.
In TDD fashion, the implementation unfortunately doesn't change:
rints :: IO [Int]
rints = return []
In fact, when I wrote this property, it passed right away. If you follow the red-green-refactor checklist, this isn't allowed. You should start each cycle by writing a red test, but this one was immediately green.
The proper reaction should be to discard (or stash) that test and instead write another one - perhaps taking a cue from the Transformation Priority Premise to pick a good next test.
For instructional reasons, however, I will stick with the order of requirements as they are stated in the OP. Instead of following the red-green-refactor checklist, I modified rints in various ways to assure myself that the assertion works as intended.
Length distribution
The next requirement isn't a simple predicate, but rather a statement about the distribution of outcomes. QuickCheck's cover function enables that - a feature that I haven't seen in other property-based testing libraries:
testProperty "Length is and distribution is correct" $ \() -> ioProperty $ do
actual <- rints
let l = length actual
return $
checkCoverage $
cover 90 (l == 3) "Length 3" $
cover 10 (l == 4) "Length 4"
True -- Base property, but really, the distribution is the test
The way that cover works, it needs to have a 'base property', but here I simply return True - the base property always passes, meaning that the distribution is the actual test.
The two instances of cover state the percentage with which each predicate (l == 3 and l == 4) should appear.
Running the tests with the degenerate implementation produces this test failure:
Length is and distribution is correct: [Failed]
*** Failed! Insufficient coverage (after 100 tests):
Only 0% Length 3, but expected 90%
As the message states, it expected 90% of Length 3 cases, but got 0%.
Again, following TDD, one can attempt to address the immediate error:
rints :: IO [Int]
rints = return [1,2,3]
This, however, now produces this test failure:
Length is and distribution is correct: [Failed]
*** Failed! Insufficient coverage (after 400 tests):
100.0% Length 3
Only 0.0% Length 4, but expected 10.0%
The property expects 10% Length 4 cases, but got 0%.
Perhaps the following is the simplest thing that could possibly work?
import System.Random.Stateful
rints :: IO [Int]
rints = do
p <- uniformRM (1 :: Int, 100) globalStdGen
if 10 < p then return [1,2,3] else return [1,2,3,4]
Perhaps not quite as random as you'd expect, but it passes all tests.
More threes
The final (explicit) requirement is that 3 should appear 50% of the times. That's another distribution property:
testProperty "3 appears 50% of the times" $ \() -> ioProperty $ do
actual <- rints
return $
checkCoverage $
cover 50 (3 `elem` actual) "3 present" $
cover 50 (3 `notElem` actual) "3 absent"
True -- Base property, but really, the distribution is the test
Running all tests produces this test failure:
3 appears 50% of the times: [Failed]
*** Failed! Insufficient coverage (after 100 tests):
100% 3 present
Only 0% 3 absent, but expected 50%
Not surprisingly, it says that the 3 present case happens 100% of the time.
In the spirit of TDD (perhaps a little undisciplined, but it illustrates what's going on), you may attempt to modify rints like this:
rints :: IO [Int]
rints = do
p <- uniformRM (1 :: Int, 100) globalStdGen
if 10 < p then return [1,2,3] else return [1,2,4,5]
This, however, doesn't work because the distribution is still wrong:
3 appears 50% of the times: [Failed]
*** Failed! Insufficient coverage (after 100 tests):
89% 3 present
11% 3 absent
Only 11% 3 absent, but expected 50%
Perhaps the following is the simplest thing that works. It's what I went with, at least:
rints :: IO [Int]
rints = do
p <- uniformRM (1 :: Int, 100) globalStdGen
includeThree <- uniformM globalStdGen
if 10 < p
then if includeThree then return [1,2,3] else return [1,2,4]
else if includeThree then return [1,2,3,4] else return [1,2,4,5]
Not elegant, and it still doesn't produce random numbers, but it passes all tests.
Random numbers
While the above covers all explicitly stated requirements, it's clearly unsatisfactory, as it doesn't really produce random numbers between 1 and 10.
This is typical of the TDD process. As you write tests and SUT and let the two interact, you discover that more tests are required than you originally thought.
To be honest, I wasn't sure what the best approach would be to 'force' generation of all numbers between 0 and 10. Now that I had the hammer of distribution tests, I wrote the following:
testProperty "All numbers are represented" $ \() -> ioProperty $ do
actual <- rints
return $
checkCoverage $
cover 5 ( 0 `elem` actual) " 0 present" $
cover 5 ( 1 `elem` actual) " 1 present" $
cover 5 ( 2 `elem` actual) " 2 present" $
cover 5 ( 3 `elem` actual) " 3 present" $
cover 5 ( 4 `elem` actual) " 4 present" $
cover 5 ( 5 `elem` actual) " 5 present" $
cover 5 ( 6 `elem` actual) " 6 present" $
cover 5 ( 7 `elem` actual) " 7 present" $
cover 5 ( 8 `elem` actual) " 8 present" $
cover 5 ( 9 `elem` actual) " 9 present" $
cover 5 (10 `elem` actual) "10 present"
True -- Base property, but really, the distribution is the test
I admit that I'm not entirely happy with this, as it doesn't seem to 'scale' to problems where the function image is much larger. I'm open to better alternatives.
I also didn't want to be too specific about the exact distribution of each number. After all, 3 is going to appear more frequently than the others. For that reason, I just picked a small percentage (5%) to indicate that each number should appear not too rarely.
The implementation of rints so far failed this new test in the same manner as the other distribution tests.
Crudely, I changed the implementation to this:
rints :: IO [Int]
rints = do
p <- uniformRM (1 :: Int, 100) globalStdGen
let l = if 10 < p then 3 else 4
ns <- shuffle $ [0..2] ++ [4..10]
includeThree <- uniformM globalStdGen
if includeThree
then do
let ns' = take (l - 1) ns
shuffle $ 3 : ns'
return $ take l ns
While I feel that there's room for improvement, it passes all tests and actually produces random numbers:
ghci> rints
ghci> rints
ghci> rints
ghci> rints
ghci> rints
This example used QuickCheck with Haskell, but most of the ideas translate to other languages. QuickCheck's cover function may be an exception to that rule, since I'm not aware that it's been ported to common language implementations, but perhaps I'm just behind the curve.
In situation where something like cover isn't available, you'd have to write a test that loops through enough randomly generated test cases to verify that the distribution is as required. A little more work, but not impossible.
Since Nikos Baxevanis asked, here's the shuffle implementation:
shuffle :: [a] -> IO [a]
shuffle xs = do
ar <- newArray l xs
forM [1..l] $ \i -> do
j <- uniformRM (i, l) globalStdGen
vi <- readArray ar i
vj <- readArray ar j
writeArray ar j vi
return vj
l = length xs
newArray :: Int -> [a] -> IO (IOArray Int a)
newArray n = newListArray (1, n)
I lifted it from https://wiki.haskell.org/Random_shuffle and perhaps edited a bit.
I would like to try out test-driven development, but the project I am working on involves a lot of randomness
You should be aware that "randomness" hits TDD in a rather awkward spot, so it isn't the most straight forward "try-it-out" project.
There are two concerns - one that "randomness" is a very expensive assertion to make:
How would you reliably distinguish between this implementation and a "real" random number generator that just happens to be emitting a finite sequence of 4s before changing to some other number?
So we get to choose between stable tests that don't actually express all of our constraints or more precise tests that occasionally report incorrect results.
One design approach here it to lean into "testability" - behind the facade of our interface will be an implementation that combines a general purpose source of random bits with a deterministic function that maps a bit sequence to some result.
def randomListOfIntegers():
seed_bits = generalPurpose.random_bits()
return determisticFunction(seed_bits)
def deterministicFunction(seed_bits):
The claim being that randomListOfIntegers is "so simple that there are obviously no deficiencies", so we can establish its correctness by inspection, and concentrate our effort on the design of deterministicFunction.
Now, we run into a second problem: the mapping of seed_bits to some observable result is arbitrary. Most business domain problems (ex: a payroll system) have a single expected output for any given input, but in random systems you still have some extra degrees of freedom. If you write a function that produces an acceptable answer given any sequence of bits, then my function, which reverses the bits then calls your function, will also produce acceptable answers -- even though my answers and your answers are different.
In effect, if we want a suite of tests that alert when a code change causes a variation in behavior, then we have to invent the specification of the behavior that we want to lock in.
And unless we have a good guess as to which arbitrary behaviors will support a clean implementation, that can be pretty painful.
(Alternatively, we just lean on our pool of "acceptance" tests, which ignore code changes that switch to a different arbitrary behavior -- it's trade offs all the way down).
One of the simpler implementations we might consider is to treat the seed_bits as an index into a sequence of candidate responses
def deterministicFunction(seed_bits):
choices = ???
n = computeIndex(seed_bits, len(choices))
return choices[n]
This exposes yet another concern: k seed_bits means 2^k degrees of freedom; unless len(choices) happens to be a power of 2 and not bigger than 2^k, there's going to be some bias in choosing. You can make the bias error arbitrarily small by choosing a large enough value for k, but you can't eliminate it with a finite number of bits.
Breaking down the problem further, we can split the work into two elements, one responsible for producing the pool of candidates, another for actually choosing one of them.
def deterministicFunction(seed_bits):
return choose(seed_bits, weighted_candidates())
def choose(seed_bits, weighted_candidates):
choices = []
# note: the order of elements in this enumeration
# is still arbitrary
for candidate, weight in weighted_candidates:
for _ in range(weight):
# technically, this is also still arbirary
n = computeIndex(seed_bits, len(choices))
return choices[n]
At this point, we can decide to use "simplest thing that could possibly work" to implement computeIndex (test first, if you like), and this new weighted_candidates() function is also easy to test, since each test of it is just "count the candidates and make sure that the problem constraints are satisfied by the population as a whole". choose can be tested using much simpler populations as inputs.
This kind of an implementation could be unsatisfactory - after all, we're building this data structure of candidates, and then another of choices, only to pick a single one. That may be the best we can do. Often, however, different implementation is possible.
The problem specification, in effect, defines for us the size of the (weighted) population of responses. In other words, len(choices) is really some constant L.
choices = [ generate(n) for n in range(L)]
n = computeIndex(seed_bits, L)
return choices[n]
which in turn can be simplified to
n = computeIndex(seed_bits, L)
return generate(n)
Which is to say, we don't need to pass around a bunch of data structures if we can calculate which response is in the nth place.
Notice that while generate(n) still has arbitrary behavior, there are definitive assertions we can make about the data structure [generate(n) for n in range(L)].
Refactoring a bit to clean things up, we might have
def randomListOfIntegers():
seed_bits = generalPurpose.random_bits()
n = computeIndex(seed_bits, L)
return generateListOfIntegers(n)
Note that this skeleton hasn't "emerged" from a writing out a bunch of tests and refactoring, but instead from thinking about the problem and the choices that we need to consider in order to "control the gap between decision and feedback".
It's probably fair to call this a "spike" - a sandbox exercise that we use to better understand the problem we are trying to solve.
The same number doesn’t appear twice
An awareness of combinatorics is going to help here.
Basic idea: we can compute the set of all possible arrangements of 4 unique elements of the set [0,1,2,3,4,5,6,7,8,9,10], and we can use a technique called squashed ordering to produce a specific subset of them.
Here, we'd probably want to handle the special case of 3 a bit more carefully. The rough skeleton is going to look something like
def generateListOfIntegers(n):
other_numbers = [0,1,2,4,5,6,7,8,9,10]
has3, hasLength3, k = decode(n)
if has3:
if hasLength3:
# 45 distinct candidates
assert 0 <= k < 45
return [3] ++ choose(other_numbers, 2, k)
# 120 distinct candidates
assert 0 <= k < 120
return [3] ++ choose(other_numbers, 3, k)
if hasLength3:
# 120 distinct candidates
assert 0 <= k < 120
return choose(other_numbers, 3, k)
# 210 distinct candidates
assert 0<= k < 210
return choose(other_numbers, 4, k)
Where choose(other_numbers, j, k) returns the kth subset of other_numbers with j total elements, and decode(n) has the logic necessary to ensure that the population weights come out right.
The behavior of choose is arbitrary, but there is a "natural" order to the progression of subsets (ie, you can "sort" them), so it's reasonable to arbitrarily use the sorted order.
It's probably also worth noticing that choose is very general purpose - the list we pass in could be just about anything, and it really doesn't care what you do with the answer. Compare that with decode, where our definition of the "right" behavior is tightly coupled to its consumption by generateListOfNumbers.
You may want to review Peter Seiber's Fischer Chess Exercise, to see the different approaches people were taking when TDD was new. Warning, the threading is horribly broken now, so you may need to sift through multiple threads to find all the good bits.
First of all, there are more than one approach to TDD, so there's no single right answer. But here's my take on this:
You mentioned that you don't need to test exact statistical distribution, but I think that you must. Otherwise, writing the simplest code that satisfies your tests will result in a completely deterministic, non-random solution. (If you'd look at your requirements without thinking about randomness, you'll find out that you can satisfy them using a simple loop and few if statements). But apparently, that's not what you really want. Therefore in your case, you must write a test that checks the statistical distribution of your algorithm.
Such tests needs to gather many results of your function under tests and therefore may be slow, so some people will consider it a bad practice, but IMHO this is your only way to actually test what you really care about.
Assuming that this is not only a theoretical exercise, you may also want to save the results to a file that you can later examine manually (e.g. using Excel), check additional statistical properties of the results, and potentially add or adjust your tests accordingly.
The following video contains a mathematical card trick due to Colm Mulcahy:
The key operation in the trick is defined as follows:
COAT (Count Out And Transfer)
Given a packet of n cards, COATing k cards refers to counting out that many from the top into a pile, thus reversing their order, and transferring those as a unit to the bottom.
(Definition taken from http://graphics8.nytimes.com/packages/pdf/crossword/Mulcahy_Mathematical_Card_Magic-Sample2.pdf)
In Haskell:
coat k cards = (drop k cards) ++ (reverse . take k $ cards)
Main> take 5 $ iterate (coat 3) [1..5]
A characteristic property of the COAT operation is that after 4 iterations, the list returns to its original order, iff k >= n/2.
Is it practical to prove this property for the Haskell code? Would a proof require the use of dependent types to express the constraint on k? (Maybe Idris would be a better language?)
(I'm not sure how to deal with iteration in a proof. I guess in this case the four iterations could just be unrolled.)
An SMT solver can handle such a proof for any concrete deck-size. Proving for an arbitrary deck-size n would require induction, which is beyond the capability of automated provers, at least for the time being. (Also, there isn't any out-of-the-box tools that can take an arbitrary Haskell program and do such proofs for you. Liquid Haskell goes far, but I'm not sure if it would be suitable for this purpose.)
Haskell's SBV library allows you to express such problems conveniently, shipping the proof obligations to z3 (or other SMT solvers) for a push-button experience. Here's how one can code your problem using SBV.
First some preliminaries:
{-# LANGUAGE ScopedTypeVariables #-}
import Data.SBV
import Data.SBV.List
import Prelude hiding (length, take, drop, reverse, (++))
Note the hiding of common functions length/take etc. from Prelude. We'll instead use their symbolic equivalents as provided by Data.SBV.List. (You're going to have to trust SBV that they are faithful implementations of the same, extending them to the domain of symbolic inputs that you can do proofs with.)
As you noted, here's how coat can be defined:
coat :: SymVal a => SInteger -> SList a -> SList a
coat k cards = drop k cards ++ reverse (take k cards)
Aside from the "funky" signature, the definition is the same as you gave, i.e., stereotypical Haskell. The signature "generalizes" from concrete integer/lists to their symbolic counterparts, so we can do a proof on them.
Coating four times can be expressed using regular composition:
fourCoat :: SymVal a => SInteger -> SList a -> SList a
fourCoat k = coat k . coat k . coat k . coat k
Let's represent the deck as a list of integers:
type Deck = SList Integer
Think of SList Integer as [Integer], just one whose contents can be symbolic values. Note that a regular deck will have distinct cards, we do not impose this in our proof. That is, the proof works regardless whether we put duplicates into the deck, which generalizes the theorem you're trying to establish.
And here's how you can pose your theorem as an SBV query:
coatCheck :: Integer -> IO ThmResult
coatCheck n = prove $ do
deck :: Deck <- free "deck"
k <- free "k"
constrain $ length deck .== literal n
constrain $ 2*k .>= literal n
pure $ deck .== fourCoat k deck
This part of the code is a bit involved. But we're essentially saying for any arbitrary deck (free "deck"), and any k, if you fourCoat the deck then you get it back intact (very last line). The first constraint limits the size of the deck to the given literal value, i.e., we'll be doing the proof for a deck that has exactly n cards in it. The second constraint is exactly what your theorem stipulated, written slightly rearranged to avoid division by 2. Note that k is not fixed: The proof will be valid for any k, subject to the given constraints.
How does this proof fare? Well, it depends on how large n is! Here's a demo with n = 6:
*Main> coatCheck 6
This takes about 3 seconds to run on my machine. As you increase n, the solver time will increase; but if you wait long enough, you'll get a proof. (Unless you run out of memory, depending on your hardware!)
Notice that it's also possible to make the parameter n symbolic as well. While the SMT solver is capable of expressing this, in my experiments I couldn't get any proof whatsoever. (It simply didn't return in a reasonable time.) And I wouldn't expect it to either: As alluded above, such a proof would require induction, and SMT solvers can't handle such problems quite yet.
So, the answer to your question is a qualified yes: If you're willing to squint the right way and do the proof for fixed-sized decks, then yes; there's an automated way to conduct such proofs in Haskell by using an SMT solver as the underlying engine. If you want a proof for an arbitrary n, then you'll have to use a proper theorem prover like Isabelle, Coq, Lean, etc., and of course you will have to program in their native language, and not Haskell. (Though they accept more or less the same family of functional programs, and this particular problem wouldn't be hard to code in any of these tools.)
You can find the entire code for this problem as an example in the SBV distribution.
I'm using Haskell to find a list of integers from 1 to 10000 that have a special property. I do the following
[ number | number <- [1..10000], (isSpecial number)]
However, every now and then I came up with some special properties that are
hard to be satisfied
take a long time to be verified
As a result, it hangs there after some first few examples.
I wonder how I can make the list comprehension in Haskell verbose, so I have a good update about how much Haskell has progressed.
This is more or less what Robin Zigmond meant:
checkNumbers :: IO [Int]
checkNumbers = filterM check [1..10000]
check number = do
print $ "Checking number" <> show number
pure $ isSpecial number
This will print "Checking number x" before checking every number. Feel free to experiment with any other effects (or, in your words, "verbosity") within the check function.
Here is a way that requires no IO, instead relying on laziness and your programmer guess about which "side" of the condition happens more often. Just to have something to play with, here's a slightly slow function that checks if a number is a multiple of 10. The details of this function aren't important, feel free to skip it if anything doesn't make sense. I'm also going to turn on timing reporting; you'll see why later.
> isSpecial :: Int -> Bool; isSpecial n = last [1..10000000] `seq` (n `mod` 10 == 0)
> :set +s
(Add one 0 every five years.)
Now the idea will be this: instead of your list comprehension, we'll use partition to split the list into two chunks, the elements that match the predicate and the ones that don't. We'll print the one of those that has more elements, so we can keep an eye on progress; by the time it's fully printed, the other one will be fully evaluated and we can inspect it however we like.
> :m + Data.List
> (matches, nonMatches) = partition isSpecial [1..20]
(0.00 secs, 0 bytes)
> nonMatches
(12.40 secs, 14,400,099,848 bytes)
Obviously I can't portray this over StackOverflow, but when I did the above thing, the numbers in the nonMatches list slowly appeared on my terminal one-by-one, giving a pretty good indicator of where in the list it was currently thinking. And now, when you print matches, the full list is available more or less instantly, as you can see by the timing report (i.e. not another 12-second wait):
> matches
(0.01 secs, 64,112 bytes)
But beware!
It's important that matches and nonMatches have types which are not typeclass polymorphic (i.e. don't have types that start with Num a => ... or some other constraint). In the above example, I achieved this by making isSpecial monomorphic, which forces matches and nonMatches to be, too, but if your isSpecial is polymorphic, you should give a type signature for matches or nonMatches to prevent this problem.
Doing it this way will cause the entire nonMatches and matches lists to be realized in memory. This could be expensive if the original list being partitioned is very long. (But up to, say, a couple hundred thousand Ints is not particularly long for modern computers.)
You can have a look at Debug.Trace. It allows printing messages to the console. But as Haskell is lazy, controlling when printing happens is not so easy. And this is also not recommended for production:
Prelude Debug.Trace> import Debug.Trace
Prelude Debug.Trace> [x | x <- [1..10], traceShow (x, odd x) $ odd x]
We would usually want to see both the tried and the discovered numbers as the calculation goes on.
What I usually do is break up the input list into chunks of n elements, filter each chunk as you would the whole list, and convert each chunk into a pair of its head element and the filtered chunk:
chunked_result = [ (h, [n | n <- chunk, isSpecial n])
| chunk#(h:_) <- chunksOf n input]
Putting such result list through concatMap snd gives the original non-"verbose" option.
Adjusting the n value will influence the frequency with which the progress will be "reported" when the result list is simply printed, showing both the tried and the discovered numbers, with some inconsequential "noise" around them.
Using second concat . unzip on the chunks results list is somewhat similar to Daniel Wagner's partitioning idea (with caveats),(*) but with your set value of n, not just 1.
If there is an algorithmic slowdown innate to your specific problem, apply the orders of growth run time estimation analysis.
(*) to make it compatible we need to stick some seq in the middle somewhere, like
chunked_result = [ (last s `seq` last chunk, s)
| chunk <- chunksOf n input
let s = [n | n <- chunk, isSpecial n] ]
or something.
Say I have a unique list of length 9 of the values between 1 and 9 inclusive in a random order (think sudoku), and I want to extract a the sub-list of the items that occur between the values 1 and 9 (exclusive). IE: between1and9([1,3,5,4,2,9,7,8,6],[3,5,4,2]) should be true.
At the moment I'm trying to use flatten/2, but not having much luck. Here's my current tactic (assuming I enforce List ins 1..9, maplist(all_distinct, List), length(List, 9) elsewhere to keep it tidy here/seperation of concerns):
between1and9(List,Between) :-
flatten([_,[1],Between,[9],_], List);
flatten([_,[9],Between,[1],_], List).
This version fails though when 1 or 9 are at the first or last position in List, or if they're adjacent within List. between1and9([_,1,9,_,_,_,_,_,_],[]) is true, but between1and9([_,1,9,_,_,_,_,_,_],_) is false (and fails when I try to use it as a constraint to solve a bigger problem.)
It seems to be the same problem casuing both failures, flatten doesn't seem to like treating unknowns as empty lists unless they're made explicit somewhere.
I can see why that would potentially be, if flatten could "invent" empty lists in the first argument it would mean an infinite set of solutions for anything in the first argument. Although my full program has other constraints to prevent this, I can understand why flatten might not want to accomodate it.
I can account for the edge cases (pun intended) by matching every permutation with disjunctions (ie: flatten([_,1,B,9,_],L);flatten([_,9,B,1,_],L);flatten([_,1,B,9]);flatten..., And account for the Between as an empty list with: \*above permutations on flatten*\; ( Between = [], (\*permutations for either edge and 1/9*\) )
But that seems to be making an already longwinded solution (10 permutations of flatten in total) even worse (18) so I have two (strongly related) questions:
If I could do the following:
between1and9(L,B) :-
( ( X = 1, Y = 9 ); ( X = 9, Y = 1 ) ),
( ( Z1 = _; Z1 = [] ), ( Z2 = _ ; Z2 = [] ) ),
( B = _; B = [] ),
I wouldn't have to manually type out each permutation of match for flatten. Unfortunately this and a few variations on it all unilaterally fail. Am I missing somethign obvious here? (I suspect opperator precedence but I've tried a few different versions.)
Or am I doing this completely wrong? The flatten/2 documentation suggests that in most cases it's an anti-pattern, is there a more prolog-ish* way to go about solving this problem? Given all the pitfalls I'm realising as I go through this I'm almost certain there is.
(Sorry, I'm painfully aware that a lot of the terminology I'm using to describe things in this is probably very wrong, I'm only kind of familiar with predicate/formal logic and much more used-to describing control flow type programming. Even though I understand logic programming in practice reasonably well I'm struggling to find the language to talk about it robustly yet, I will amend this question with any corrections I get.)
Some background: I'm new to prolog and testing out my understanding by trying to extend one of the many sudoku solvers to solve a strange variety of sudoku I found in some puzzles I printed out years ago where you're shown the sum of all the numbers that appear between the 1 and the 9 in any given row or column as an extra hint, it's kind of like a mix of sudoku and picross. The solver as it stands now is on swish: SumSudoku(swish). Although it may be a mess when you get to it.
*Corollary quesiton: is there a prolog version of the word "pythonic?"
You could use good old append/3 for this. Is it possible that you wanted append/3 all along but somehow thought it is called flatten?
For the "1 comes before 9" case, you'd write:
between_1_and_9(List, Sublist) :-
append(_, [1|Rest], List),
append(Sublist, [9|_], Rest).
You need to swap 1 and 9 for the "9 comes before 1" case.
This also leaves a "spurious choice point" (Thank you #PauloMoura for the comment). Make sure to get rid of it somehow.
As for "Pythonic" (and this comes from a recovering Pythonista), I can only say, rest assured:
There is always more than one obvious way to do it in Prolog.
You don't even have to be Dutch.
I'd like to know what is the property testing aiming for, what is it's sweet point, where it should be used. Let' have an example function that I want to test:
f :: [Integer] -> [Integer]
This function, f, takes a list of numbers and will square the odd numbers and filter out the even numbers. I can state some properties about the function, like
Given a list of even numbers, return empty list.
Given a list of odd numbers, the result list will have the same size as input.
Given that I have a list of even numbers and a list of odd numbers, when I join them, shuffle and pass to the function, the length of the result will be the length of the list of odd numbers.
Given I provide a list of positive odd numbers, then each element in the result list at the same index will be greater than in the original list
Given I provide a list of odd numbers and even numbers, join and shuffle them, then I will get a list, where each number is odd
None of the properties test, that the function works for the simplest case, e.g. I can make a simple case, that will pass these properties if I implement the f incorrectly:
f = fmap (+2) . filter odd
So, If I want to cover some simple case, It looks like I either need to repeat a fundamental part of the algorithm in the property specification, or I need to use value based testing. The first option, that I have, to repeat the algorithm may be useful, If I plan to improve the algorithm if I plan to change it's implementation, for speed for example. In this way, I have a referential implementation, that I can use to test again.
If I want to check, that the algorithm doesn't fail for some trivial cases and I don't want to repeat the algorithm in the specification, it looks like I need some unit testing. I would write for example these checks:
f ([2,5]) == [25]
f (-8,-3,11,1) == [9,121,1]
Now I have a lot more confidence it the algorithm.
My question is, is the property based testing meant to replace the unit testing, or is it complementary? Is there some general idea, how to write the properties, so they are useful or it just totally depends on the understanding of the logic of the function? I mean, can one tell that writing the properties in some way is especially beneficial?
Also, should one strive to make the properties test every part of the algorithm? I could put the squaring out of the algorithm, and then test it elsewhere, let the properties test just the filtering part, which it looks like, that it covers it well.
f :: (Integer -> Integer) -> [Integer] -> [Integer]
f g = fmap g . filter odd
And then I can pass just Prelude.id and test the g elsewhere using unit testing.
How about the following properties:
For all odd numbers in the source list, its square is element of the result list.
For all numbers in the result list, there is a number in the source list whose square it is.
By the way, odd is easier to read than \x -> x % 2 == 1
Reference algorithm
It's very common to have a (possibly inefficient) reference implementation and test against that. In fact, that's one of the most common quickcheck strategies when implementing numeric algorithms. But not every part of the algorithm needs one. Sometimes there are some properties that characterize the algorithm completely.
Ingo's comment is spot on in that regard: These properties determine the results of your algorithm (up to order and duplicates). To recover order and duplicates you can modify the properties to include "in the resulting list truncated after the position of the source element" and vice versa in the other property.
Granularity of tests
Of course, given Haskell's composability it's nice to test each reasonable small part of an algorithm by itself. I trust e.g. \x -> x*x and filter odd as reference without looking twice.
Whether there should be properties for each part is not as clear as you might inline that part of the algorithm later and thus make the properties moot. Due to Haskell's laziness that's not a common thing to do, but it happens.
If so, is this a part of the standard or a ghc specific optimisation we can depend on? Or just an optimisation which we can't necessarily depend on.
When I tried a test sample, it seemed to indicate that it was taking place/
Prelude> let isOdd x = x `mod` 2 == 1
Prelude> let isEven x = x `mod` 2 == 0
Prelude> ((filter isOdd).(filter isEven)) [1..]
Chews up CPU but doesn't consume much memory.
Depends on what you mean by generator. The list is lazily generated, and since nothing else references it, the consumed parts are garbage collected almost immediately. Since the result of the above computation doesn't grow, the entire computation runs in constant space. That is not mandated by the standard, but as it is harder to implement nonstrict semantics with different space behaviour for that example (and lots of vaguely similar), in practice you can rely on it.
But normally, the list is still generated as a list, so there's a lot of garbage produced. Under favourable circumstances, ghc eliminates the list [1 .. ] and produces a non-allocating loop:
result :: [Int]
result = filter odd . filter even $ [1 .. ]
(using the Prelude functions out of laziness), compiled with -O2 generates the core
List.result_go =
\ (x_ayH :: GHC.Prim.Int#) ->
case GHC.Prim.remInt# x_ayH 2 of _ {
case x_ayH of wild_Xa {
__DEFAULT -> List.result_go (GHC.Prim.+# wild_Xa 1);
9223372036854775807 -> GHC.Types.[] # GHC.Types.Int
0 ->
case x_ayH of wild_Xa {
__DEFAULT -> List.result_go (GHC.Prim.+# wild_Xa 1);
9223372036854775807 -> GHC.Types.[] # GHC.Types.Int
A plain loop, running from 1 to maxBound :: Int, producing nothing on the way and [] at the end.
It's almost smart enough to plain return []. Note that there's only one division by 2, GHC knows that if an Int is even, it can't be odd, so that check has been eliminated, and in no branch a non-empty list is created (i.e., the unreachable branches have been eliminated by the compiler).
Strictly speaking, Haskell does not specify any particular evaluation model, so implementations are free to implement the language's semantics how they want. However, in any sane implementation, including GHC, you can rely on this running in constant space.
In GHC, computations like these result in a singly-linked list ending in a thunk representing the remainder of the list which has not yet been evaluated. As you evaluate this list, more of the list will be generated on demand, but since the beginning of the list is not referred to anywhere else, the earlier parts are immediately eligible for garbage collection, so you get constant space behavior.
With optimizations enabled, GHC is very likely to perform deforestation here, optimizing away the need for having a list at all, and the result will be a simple loop with no allocation performed.