What to use property testing for - unit-testing

I'd like to know what is the property testing aiming for, what is it's sweet point, where it should be used. Let' have an example function that I want to test:
f :: [Integer] -> [Integer]
This function, f, takes a list of numbers and will square the odd numbers and filter out the even numbers. I can state some properties about the function, like
Given a list of even numbers, return empty list.
Given a list of odd numbers, the result list will have the same size as input.
Given that I have a list of even numbers and a list of odd numbers, when I join them, shuffle and pass to the function, the length of the result will be the length of the list of odd numbers.
Given I provide a list of positive odd numbers, then each element in the result list at the same index will be greater than in the original list
Given I provide a list of odd numbers and even numbers, join and shuffle them, then I will get a list, where each number is odd
etc.
None of the properties test, that the function works for the simplest case, e.g. I can make a simple case, that will pass these properties if I implement the f incorrectly:
f = fmap (+2) . filter odd
So, If I want to cover some simple case, It looks like I either need to repeat a fundamental part of the algorithm in the property specification, or I need to use value based testing. The first option, that I have, to repeat the algorithm may be useful, If I plan to improve the algorithm if I plan to change it's implementation, for speed for example. In this way, I have a referential implementation, that I can use to test again.
If I want to check, that the algorithm doesn't fail for some trivial cases and I don't want to repeat the algorithm in the specification, it looks like I need some unit testing. I would write for example these checks:
f ([2,5]) == [25]
f (-8,-3,11,1) == [9,121,1]
Now I have a lot more confidence it the algorithm.
My question is, is the property based testing meant to replace the unit testing, or is it complementary? Is there some general idea, how to write the properties, so they are useful or it just totally depends on the understanding of the logic of the function? I mean, can one tell that writing the properties in some way is especially beneficial?
Also, should one strive to make the properties test every part of the algorithm? I could put the squaring out of the algorithm, and then test it elsewhere, let the properties test just the filtering part, which it looks like, that it covers it well.
f :: (Integer -> Integer) -> [Integer] -> [Integer]
f g = fmap g . filter odd
And then I can pass just Prelude.id and test the g elsewhere using unit testing.

How about the following properties:
For all odd numbers in the source list, its square is element of the result list.
For all numbers in the result list, there is a number in the source list whose square it is.
By the way, odd is easier to read than \x -> x % 2 == 1

Reference algorithm
It's very common to have a (possibly inefficient) reference implementation and test against that. In fact, that's one of the most common quickcheck strategies when implementing numeric algorithms. But not every part of the algorithm needs one. Sometimes there are some properties that characterize the algorithm completely.
Ingo's comment is spot on in that regard: These properties determine the results of your algorithm (up to order and duplicates). To recover order and duplicates you can modify the properties to include "in the resulting list truncated after the position of the source element" and vice versa in the other property.
Granularity of tests
Of course, given Haskell's composability it's nice to test each reasonable small part of an algorithm by itself. I trust e.g. \x -> x*x and filter odd as reference without looking twice.
Whether there should be properties for each part is not as clear as you might inline that part of the algorithm later and thus make the properties moot. Due to Haskell's laziness that's not a common thing to do, but it happens.

Related

How can I calculate the length of a list containing lists in OCAML

i am a beginner in ocaml and I am stuck in my project.
I would like to count the number of elements of a list contained in a list.
Then test if the list contains odd or even lists.
let listoflists = [[1;2] ; [3;4;5;6] ; [7;8;9]]
output
l1 = even
l2 = even
l3 = odd
The problem is that :
List.tl listoflists
Gives the length of the rest of the list
so 2
-> how can I calculate the length of the lists one by one ?
-> Or how could I get the lists and put them one by one in a variable ?
for the odd/even function, I have already done it !
Tell me if I'm not clear
and thank you for your help .
Unfortunately it's not really possible to help you very much because your question is unclear. Since this is obviously a homework problem I'll just make a few comments.
Since you talk about putting values in variables you seem to have some programming experience. But you should know that OCaml code tends to work with immutable variables and values, which means you have to look at things differently. You can have variables, but they will usually be represented as function parameters (which indeed take different values at different times).
If you have no experience at all with OCaml it is probably worth working through a tutorial. The OCaml.org website recommends the first 6 chapters of the OCaml manual here. In the long run this will probably get you up to speed faster than asking questions here.
You ask how to do a calculation on each list in a list of lists. But you don't say what the answer is supposed to look like. If you want separate answers, one for each sublist, the function to use is List.map. If instead you want one cumulative answer calculated from all the sublists, you want a fold function (like List.fold_left).
You say that List.tl calculates the length of a list, or at least that's what you seem to be saying. But of course that's not the case, List.tl returns all but the first element of a list. The length of a list is calculated by List.length.
If you give a clearer definition of your problem and particularly the desired output you will get better help here.
Use List.iter f xs to apply function f to each element of the list xs.
Use List.length to compute the length of each list.
Even numbers are integrally divisible by two, so if you divide an even number by two the remainder will be zero. Use the mod operator to get the remainder of the division. Alternatively, you can rely on the fact that in the binary representation the odd numbers always end with 1 so you can use land (logical and) to test the least significant bit.
If you need to refer to the position of the list element, use List.iteri f xs. The List.iteri function will apply f to two arguments, the first will be the position of the element (starting from 0) and the second will be the element itself.

Haskell - Why is Alternative implemented for List

I have read some of this post Meaning of Alternative (it's long)
What lead me to that post was learning about Alternative in general. The post gives a good answer to why it is implemented the way it is for List.
My question is:
Why is Alternative implemented for List at all?
Is there perhaps an algorithm that uses Alternative and a List might be passed to it so define it to hold generality?
I thought because Alternative by default defines some and many, that may be part of it but What are some and many useful for contains the comment:
To clarify, the definitions of some and many for the most basic types such as [] and Maybe just loop. So although the definition of some and many for them is valid, it has no meaning.
In the "What are some and many useful for" link above, Will gives an answer to the OP that may contain the answer to my question, but at this point in my Haskelling, the forest is a bit thick to see the trees.
Thanks
There's something of a convention in the Haskell library ecology that if a thing can be an instance of a class, then it should be an instance of the class. I suspect the honest answer to "why is [] an Alternative?" is "because it can be".
...okay, but why does that convention exist? The short answer there is that instances are sort of the one part of Haskell that succumbs only to whole-program analysis. They are global, and if there are two parts of the program that both try to make a particular class/type pairing, that conflict prevents the program from working right. To deal with that, there's a rule of thumb that any instance you write should live in the same module either as the class it's associated with or as the type it's associated with.
Since instances are expected to live in specific modules, it's polite to define those instances whenever you can -- since it's not really reasonable for another library to try to fix up the fact that you haven't provided the instance.
Alternative is useful when viewing [] as the nondeterminism-monad. In that case, <|> represents a choice between two programs and empty represents "no valid choice". This is the same interpretation as for e.g. parsers.
some and many does indeed not make sense for lists, since they try iterating through all possible lists of elements from the given options greedily, starting from the infinite list of just the first option. The list monad isn't lazy enough to do even that, since it might always need to abort if it was given an empty list. There is however one case when both terminates: When given an empty list.
Prelude Control.Applicative> many []
[[]]
Prelude Control.Applicative> some []
[]
If some and many were defined as lazy (in the regex sense), meaning they prefer short lists, you would get out results, but not very useful, since it starts by generating all the infinite number of lists with just the first option:
Prelude Control.Applicative> some' v = liftA2 (:) v (many' v); many' v = pure [] <|> some' v
Prelude Control.Applicative> take 100 . show $ (some' [1,2])
"[[1],[1,1],[1,1,1],[1,1,1,1],[1,1,1,1,1],[1,1,1,1,1,1],[1,1,1,1,1,1,1],[1,1,1,1,1,1,1,1],[1,1,1,1,1,"
Edit: I believe the some and many functions corresponds to a star-semiring while <|> and empty corresponds to plus and zero in a semiring. So mathematically (I think), it would make sense to split those operations out into a separate typeclass, but it would also be kind of silly, since they can be implemented in terms of the other operators in Alternative.
Consider a function like this:
fallback :: Alternative f => a -> (a -> f b) -> (a -> f e) -> f (Either e b)
fallback x f g = (Right <$> f x) <|> (Left <$> g x)
Not spectacularly meaningful, but you can imagine it being used in, say, a parser: try one thing, falling back to another if that doesn't work.
Does this function have a meaning when f ~ []? Sure, why not. If you think of a list's "effects" as being a search through some space, this function seems to represent some kind of biased choice, where you prefer the first option to the second, and while you're willing to try either, you also tag which way you went.
Could a function like this be part of some algorithm which is polymorphic in the Alternative it computes in? Again I don't see why not. It doesn't seem unreasonable for [] to have an Alternative instance, since there is an implementation that satisfies the Alternative laws.
As to the answer linked to by Will Ness that you pointed out: it covers that some and many don't "just loop" for lists. They loop for non-empty lists. For empty lists, they immediately return a value. How useful is this? Probably not very, I must admit. But that functionality comes along with (<|>) and empty, which can be useful.

Proving a simple list function applied four times is the identity

The following video contains a mathematical card trick due to Colm Mulcahy:
https://www.youtube.com/watch?v=dHzUQnRjbuM
The key operation in the trick is defined as follows:
COAT (Count Out And Transfer)
Given a packet of n cards, COATing k cards refers to counting out that many from the top into a pile, thus reversing their order, and transferring those as a unit to the bottom.
(Definition taken from http://graphics8.nytimes.com/packages/pdf/crossword/Mulcahy_Mathematical_Card_Magic-Sample2.pdf)
In Haskell:
coat k cards = (drop k cards) ++ (reverse . take k $ cards)
Example:
Main> take 5 $ iterate (coat 3) [1..5]
[[1,2,3,4,5],[4,5,3,2,1],[2,1,3,5,4],[5,4,3,1,2],[1,2,3,4,5]]
A characteristic property of the COAT operation is that after 4 iterations, the list returns to its original order, iff k >= n/2.
Is it practical to prove this property for the Haskell code? Would a proof require the use of dependent types to express the constraint on k? (Maybe Idris would be a better language?)
(I'm not sure how to deal with iteration in a proof. I guess in this case the four iterations could just be unrolled.)
An SMT solver can handle such a proof for any concrete deck-size. Proving for an arbitrary deck-size n would require induction, which is beyond the capability of automated provers, at least for the time being. (Also, there isn't any out-of-the-box tools that can take an arbitrary Haskell program and do such proofs for you. Liquid Haskell goes far, but I'm not sure if it would be suitable for this purpose.)
Haskell's SBV library allows you to express such problems conveniently, shipping the proof obligations to z3 (or other SMT solvers) for a push-button experience. Here's how one can code your problem using SBV.
First some preliminaries:
{-# LANGUAGE ScopedTypeVariables #-}
import Data.SBV
import Data.SBV.List
import Prelude hiding (length, take, drop, reverse, (++))
Note the hiding of common functions length/take etc. from Prelude. We'll instead use their symbolic equivalents as provided by Data.SBV.List. (You're going to have to trust SBV that they are faithful implementations of the same, extending them to the domain of symbolic inputs that you can do proofs with.)
As you noted, here's how coat can be defined:
coat :: SymVal a => SInteger -> SList a -> SList a
coat k cards = drop k cards ++ reverse (take k cards)
Aside from the "funky" signature, the definition is the same as you gave, i.e., stereotypical Haskell. The signature "generalizes" from concrete integer/lists to their symbolic counterparts, so we can do a proof on them.
Coating four times can be expressed using regular composition:
fourCoat :: SymVal a => SInteger -> SList a -> SList a
fourCoat k = coat k . coat k . coat k . coat k
Let's represent the deck as a list of integers:
type Deck = SList Integer
Think of SList Integer as [Integer], just one whose contents can be symbolic values. Note that a regular deck will have distinct cards, we do not impose this in our proof. That is, the proof works regardless whether we put duplicates into the deck, which generalizes the theorem you're trying to establish.
And here's how you can pose your theorem as an SBV query:
coatCheck :: Integer -> IO ThmResult
coatCheck n = prove $ do
deck :: Deck <- free "deck"
k <- free "k"
constrain $ length deck .== literal n
constrain $ 2*k .>= literal n
pure $ deck .== fourCoat k deck
This part of the code is a bit involved. But we're essentially saying for any arbitrary deck (free "deck"), and any k, if you fourCoat the deck then you get it back intact (very last line). The first constraint limits the size of the deck to the given literal value, i.e., we'll be doing the proof for a deck that has exactly n cards in it. The second constraint is exactly what your theorem stipulated, written slightly rearranged to avoid division by 2. Note that k is not fixed: The proof will be valid for any k, subject to the given constraints.
How does this proof fare? Well, it depends on how large n is! Here's a demo with n = 6:
*Main> coatCheck 6
Q.E.D.
This takes about 3 seconds to run on my machine. As you increase n, the solver time will increase; but if you wait long enough, you'll get a proof. (Unless you run out of memory, depending on your hardware!)
Notice that it's also possible to make the parameter n symbolic as well. While the SMT solver is capable of expressing this, in my experiments I couldn't get any proof whatsoever. (It simply didn't return in a reasonable time.) And I wouldn't expect it to either: As alluded above, such a proof would require induction, and SMT solvers can't handle such problems quite yet.
So, the answer to your question is a qualified yes: If you're willing to squint the right way and do the proof for fixed-sized decks, then yes; there's an automated way to conduct such proofs in Haskell by using an SMT solver as the underlying engine. If you want a proof for an arbitrary n, then you'll have to use a proper theorem prover like Isabelle, Coq, Lean, etc., and of course you will have to program in their native language, and not Haskell. (Though they accept more or less the same family of functional programs, and this particular problem wouldn't be hard to code in any of these tools.)
You can find the entire code for this problem as an example in the SBV distribution.

How to detect list changes without comparing the complete list

I have a function which will fail if there has being any change on the term/list it is using since the generation of this term/list. I would like to avoid to check that each parameter still the same. So I had thought about each time I generate the term/list to perform a CRC or something similar. Before making use of it I would generate again the CRC so I can be 99,9999% sure the term/list still the same.
Going to a specfic answer, I am programming in Erlang, I am thinking on using a function of the following type:
-spec(list_crc32(List :: [term()]) -> CRC32 :: integer()).
I use term, because it is a list of terms, (erlang has already a default fast CRC libraries but for binary values). I have consider to use "erlang:crc32(term_to_binary(Term))", but not sure if there could be a better approach.
What do you think?
Regards, Borja.
Without more context it is a little bit difficult to understand why you would have this problem, particularly since Erlang terms are immutable -- once assigned no other operation can change the value of a variable, not even in the same function.
So if your question is "How do I quickly assert that true = A == A?" then consider this code:
A = generate_list()
% other things in this function happen
A = A.
The above snippet will always assert that A is still A, because it is not possible to change A like you might do in, say, Python.
If your question is "How do I assert that the value of a new list generated exactly the same value as a different known list?" then using either matching or an actual assertion is the fastest way:
start() ->
A = generate_list(),
assert_loop(A).
assert_loop(A) ->
ok = do_stuff(),
A = generate_list(),
assert_loop(A).
The assert_loop/1 function above is forcing an assertion that the output of generate_list/0 is still exactly A. There is no telling what other things in the system might be happening which may have affected the result of that function, but the line A = generate_list() will crash if the list returned is not exactly the same value as A.
In fact, there is no way to change the A in this example, no matter how many times we execute assert_loop/1 above.
Now consider a different style:
compare_loop(A) ->
ok = do_stuff(),
case A =:= generate_list() of
true -> compare_loop(A);
false -> terminate_gracefully()
end.
Here we have given ourselves the option to do something other than crash, but the effect is ultimately the same, as the =:= is not merely a test of equality, it is a match test meaning that the two do not evaluate to the same values, but that they actually match.
Consider:
1> 1 == 1.0.
true
2> 1 =:= 1.0.
false
The fastest way to compare two terms will depend partly on the sizes of the lists involved but especially on whether or not you expect the assertion to pass or fail more often.
If the check is expected to fail more often then the fastest check is to use an assertion with =, an equivalence test with == or a match test with =:= instead of using erlang:phash2/1. Why? Because these tests can return false as soon as a non-matching element is encountered -- and if this non-match occurs near the beginning of the list then a full traverse of both lists is avoided entirely.
If the check is expected to pass more often then something like erlang:phash2/1 will be faster, but only if the lists are long, because only one list will be fully traversed each iteration (the hash of the original list is already stored). It is possible, though, on a short list that a simple comparison will still be faster than computing a hash, storing it, computing another hash, and then comparing the hashes (obviously). So, as always, benchmark.
A phash2 version could look like:
start() ->
A = generate_list(),
Hash = erlang:phash2(A),
assert_loop(Hash).
assert_loop(Hash) ->
ok = do_stuff(),
Hash = erlang:phash2(generate_list()),
loop(Hash).
Again, this is an assertive loop that will crash instead of exit cleanly, so it would need to be adapted to your needs.
The basic mystery still remains, though: in a language with immutable variables why is it that you don't know whether something will have changed? This is almost certainly a symptom of an underlying architectural problem elsewhere in the program -- either that or simply a misunderstanding of immutability in Erlang.

How do I iterate through a list in a TI-83 calculator program

I created a set of programs to calculate the area under a graph using various methods of approximation (midpoint, trapezoidal, simpson) for my Calculus class.
Here is an example of one of my programs (midpoint):
Prompt A,B,N
(A-B)/N->D
Input "Y1=", Y1
0->X
0->E
For(X,A+D/2,b-D/2,D)
Y1(x)+E->E
End
Disp E*D
Instead of applying these approximation rules to a function (Y1), I would like to apply them to a list of data (L1). How do I iterate through a list? I would need to be able to get the last index in the list in order for a "For Loop" to be any good. I can't do anything like L1.length like I would do in Java.
You can obtain the length of the list using dim(). That can be found in 2nd->LIST->OPS->dim(. Just make sure that you use a list variable otherwise dim() will complain about the type. You could then index into the list with a subscript.
e.g.,
{1, 2, 3, 4} -> L1
For (X, 1, dim(L1), 1)
Disp L1(X)
End
The for loop is the simplest way to iterate over a list in TI-Basic, as it is in many languages. Jeff Mercado already covered that, so I'll mention a few techniques that are powerful tools in specialized situation.
Mapping over lists
TI-Basic supports simple mapping operation over lists that have the same effect as a map function in any other language. TI-Basic support for this extends to most basic arithmetic function, and selection of other functions.
The syntax could not be simpler. If you want to add some number X to every element in some list L1 you type X+L1→L1.
seq(
Most for loops over a lists in TI-Basic can be replaced by cleverly constructed seq( command that will outperform the for loop in time and memory. The exceptions to this rule are loops that contain I/O or storing variables.
The syntax for this command can be quite confusing, so I recommend reading over this documentation before using it. In case that link dies, here's the most relevant information.
Command Summary
Creates a list by evaluating a formula with one variable taking on a
range of values, optionally skipping by a specified step.
Command Syntax
seq(formula, variable, start-value, end-value [, step])
Menu Location
While editing a program, press:
2nd LIST to enter the LIST menu RIGHT to enter the OPS submenu 5 to
choose seq(, or use arrows.
Calculator Compatibility
TI-83/84/+/SE
Token Size
1 byte
The documentation should do a good job explaining the syntax for seq(, so I'll just provide a sample use case.
If you want the square of every number between 1 and 100 you could do this
For Loop
DelVar L1100→dim(L1
for(A,1,100
A²→L1(A
End
or, this
seq
seq(A²,A,1,100→L1
The drawback of seq( is that you can't do any I/O or store any variables inside the expression.
Predefined list iteration function
Go to the LIST menu and check out all the operations under OPS and MATH. These predefined function are always going to be faster than a for loops or even a seq( expression designed to do the same thing.