I have the following code in F# which is think is sufficiently concurrent to utilize the 4 cores of my machine. Yet cpu usage is limited to one core.
member x.Solve problemDef =
use flag = new ManualResetEventSlim(false)
let foundSoFar = MSet<'T>()
let workPile = MailboxProcessor<seq<'T>>.Start(fun inbox ->
let remaining = ref 0
let rec loop() = async {
let! data = inbox.Receive()
let data = data |> Seq.filter (not << foundSoFar.Contains) |> Array.ofSeq
foundSoFar.UnionWith data
let jobs = ref -1
for chunk in data |> Seq.distinct |> Seq.chunked 5000 do
Async.Start <| async {
Seq.collect problemDef.generators chunk
|> Array.ofSeq
|> inbox.Post
}
incr jobs
remaining := !remaining + !jobs
if (!remaining = 0 && !jobs = -1) then
flag.Set() |> ignore
else
return! loop()
}
loop()
)
workPile.Post problemDef.initData
flag.Wait() |> ignore
foundSoFar :> seq<_>
I use the MailboxProcessor as a workpile from where I get chunks of elements, filter them through a HashSet and create tasks with the new elements whose results are inserted in the workpile. This is repeated until no new elements are produced. The aim of this code is to asynchronously insert the chunks in the workpile, thus the use of tasks. My problem is that there is no parallelism.
Edit: thanks to #jon-harrop I solved the concurrency problem which was due to the lazy nature of seq, and rewrought the code following the suggestions. Is there any way to get rid of the ManualResetEvent without using a discriminated union as the message type of the agent(to support the asking message)?
Without a complete example, I found it quite difficult to understand what your code does (perhaps because it combines quite a few different concurrent programming primitives, which makes it a bit hard to follow).
Anyway, the body of MailboxProcessor is executed only once (if you want to get concurrency using plain agents, you need to start multiple agents). In the body of the agent, you start a task that runs problemDef.generators for each chunk.
This means that problemDef.generators should run in parallel. However the code that calls foundSoFar.Contains and foundSoFar.UnionWith as well as Seq.distinct is always run sequentially.
So, if problemDef.generators is a simple and efficient function, the overhead with tracking foundSoFar (which is done sequentially) is probably larger than what you get by parallelization.
I'm not familiar with MSet<'T>, but if it is (or if you replaced it with) a thread safe mutable set, then you should be able to run some of the unioning right in the Task.StartNew (in parallel with other unioning).
PS: As I said, it is difficult to tell without running the code, so my thinking may be completely wrong!
You're mixing high-level concurrency primitives (tasks and agents) with ManualResetEventSlim which is very bad. Can you use PostAndReply instead?
You're using Seq to do "work" in the spawned task which is lazy so it won't actually do anything until after it is posted back. Can you force evaluation inside the task with something like Array.ofSeq?
The way you are using Task is anomalous. Might be more idiomatic to switch to Async.Start.
Without a complete solution I cannot validate any of my guesses...
think is sufficiently concurrent to utilize the 4 cores
Your mental model of multicore parallelism might be quite off the mark.
Related
My question is probably digging a bit into the question on how smart the F# compiler really is.
I have a type module that scans a configuration file and should then provide a range of IP addresses between a start and an end address.
type IpRange (config: string) =
// Parse the config
member __.StartIp = new MyIp(startIp)
member __.EndIp = new MyIp(endIp)
Now I wanted to add the actual range giving me all IPs so I added
member __.Range =
let result = new List<MyIp>()
let mutable ipRunner = __.StartIp
while ipRunner <= __.EndIp do
result.Add(new MyIp(ipRunner))
ipRunner <- (ipRunner + 1)
result
which works but is not really idiomatic F#.
I then dug into the issue and came up with the following two alternatives
let rec GetIpRangeRec (startIp: MyIp) (endIp: MyIp) (ipList: MyIp list) =
if startIp <= endIp then
GetIpRangeRec (startIp + 1) endIp (ipList#[startIp])
else
ipList
and
let GetIpRangeUnfold (startIp: MyIp) (endIp: MyIp) =
startIp |> Seq.unfold(fun currentIp ->
if (currentIp <= endIp) then
Some(currentIp, currentIp + 1)
else
None)
As far as I have understood from reading up on lists and sequences, none is cached. So all three solutions would re-evalute the code to create a list whenever I try to access an item or enumerate the list.
I could solve this problem by using Seq.cache (and a previous cast to sequence where required) resulting in something like
member __.Range =
GetIpRangeRec startIp endIp []
|> List.toSeq
|> Seq.cache
but is this really necessary?
I have the feeling that I missed something and the F# compiler actually does cache the result without explicitely telling it to.
Lists are (normally at least, I suppose there might be some weird edge case I don't know about) stored directly as their values. Thus, your recursive function would specifically produce a list of MyIps - these would only be re-evaluated if you have done some weird thing where a MyIp is re-evaluated each time it is accessed. As in, when the function returns you'll have a fully evaluated list of MyIps.
There is one slight issue, however, in that your function as implemented isn't particularly efficient. Instead, I would recommend doing it in this slightly alternative way:
let rec GetIpRangeRec (startIp: MyIp) (endIp: MyIp) (ipList: MyIp list) =
if startIp <= endIp then
GetIpRangeRec (startIp + 1) endIp (startIp :: ipList)
else
List.rev ipList
Basically, the issue is that every time you use the # operator to append to the end of a list, the runtime has to walk to the end of the list to do the append. This means that you'll end up iterating over the list a whole bunch of times. Instead, better simply to prepend (i.e. append, but to the front), and then reverse the list just before you return it. This means that you only have to walk the list once, as prepending is always a constant-time operation (you just create a new list entry with a pointer to the previous front of the list).
Actually, since you probably don't want to use a pre-supplied list outside of your function, I would recommend doing it this way instead:
let GetIpRange startIp endIp =
let rec GetIpRangeRec (start: MyIp) (end: MyIp) (ipList: MyIp list) =
if start <= end then
GetIpRangeRec (start + 1) end (start :: ipList)
else
List.rev ipList
GetIpRangeRec startIp endIp List.empty
(note that I haven't tested this, so it may not work totally perfectly). If you do want to be able to pre-supply a starting list, then you can just stick with the first one.
Also, bear in mind that while lists are usually fine for sequential access, they're terrible for random accesses. If you need to be doing random lookups into the list, then I would recommend using a call to List.toArray once you get the complete list back. Probably no need to bother if you'll just be iterating over it sequentially though.
I'll make one more point though: From a total functional programming 'purist's' perspective your first implementation may not be totally 'functional', but the only mutability involved is all hidden away inside the function. That is, you're not mutating anything that is passed in to the function. This is perfectly fine from a functional purity perspective and might be good for performance. Remember that F# is functional-first, not zealously fuctional-only ;)
EDIT: Just thought of one more thing I would like to add: I don't know exactly how your MyIp types are constructed, but if you can build them out of numbers, it might be worth looking at using a sequence comprehension like seq {1 .. 100} and then piping that to a map to create the MyIps, e.g. seq {1 .. 100} |> Seq.map makeIp |> Seq.toList. This would be the easiest way, but would only work if you can simply specify a simple number range.
Seq is lazy in F#, ie there are benefits to caching the results occassionally. F# List is not lazy, it's an immutable single linked list that won't get any benefits from caching.
I'm writing tests for an object that takes in an input, composes some functions together, runs the input through the composed function, and returns the result.
Here's a greatly-simplified set of objects and functions that mirrors my design:
type Result =
| Success of string
let internal add5 x = x + 5
let internal mapResult number =
Success (number.ToString())
type public InteropGuy internal (add, map) =
member this.Add5AndMap number =
number |> (add >> map)
type InteropGuyFactory() =
member this.CreateInteropGuy () =
new InteropGuy(add5, mapResult)
The class is designed to be used for C# interop which explains the structure, but this problem still can apply to any function under test that composes function parameters.
I'm having trouble finding an elegant way to keep the implementation details of the internal functions from creeping in to the test conditions when testing the composing function, or in other words, isolating one link in the chain instead of inspecting the output once the input is piped entirely through. If I simply inspect the output then tests for each function are going to be dependent on downstream functions working properly, and if the one at the end of the chain stops working, all of the tests will fail. The best I've been able to do is stub out a function to return a certain value, then stub out its downstream function, storing the input of the downstream function and then asserting the stored value is equal to the output of the stubbed function:
[<TestClass>]
type InteropGuyTests() =
[<TestMethod>]
member this.``Add5AndMap passes add5 result into map function``() =
let add5 _ = 13
let tempResult = ref 0
let mapResult result =
tempResult := result
Success "unused result"
let guy = new InteropGuy(add5, mapResult)
guy.Add5AndMap 8 |> ignore
Assert.AreEqual(13, !tempResult)
Is there a better way to do this or is this generally how to test composition in isolation? Design comments also appreciated.
The first question we should ask when encountering something like this is: why do we want to test this piece of code?
When the potential System Under Test (SUT) is literally a single statement, then which value does the test add?
AFAICT, there's only two ways to test a one-liner.
Triangulation
Duplication of implementation
Both are possible, but comes with drawbacks, so I think it's worth asking if such a method/function should be tested at all.
Still, assuming that you want to test the function, e.g. to prevent regressions, you can use either of these options.
Triangulation
With triangulation, you simply throw enough example values at the SUT to demonstrate that it works as the black box it's supposed to be:
open Xunit
open Swensen.Unquote
[<Theory>]
[<InlineData(0, "5")>]
[<InlineData(1, "6")>]
[<InlineData(42, "47")>]
[<InlineData(1337, "1342")>]
let ``Add5AndMap returns expected result`` (number : int, expected : string) =
let actual = InteropGuyFactory().CreateInteropGuy().Add5AndMap number
Success expected =! actual
The advantage of this example is that it treats the SUT as a black box, but the disadvantage is that it doesn't demonstrate that the SUT is a result of any particular composition.
Duplication of implementation
You can use Property-Based Testing to demonstrate (or, at least make very likely) that the SUT is composed of the desired functions, but it requires duplicating the implementation.
Since the functions are assumed to be referentially transparent, you can simply throw enough example values at both the composition and the SUT, and verify that they return the same value:
open FsCheck.Xunit
open Swensen.Unquote
[<Property>]
let ``Add5AndMap returns composed result`` (number : int) =
let actual = InteropGuyFactory().CreateInteropGuy().Add5AndMap number
let expected = number |> add5 |> mapResult
expected =! actual
Is it ever interesting to duplicate the implementation in the test?
Often, it's not, but if the purpose of the test is to prevent regressions, it may be worthwhile as a sort of double-entry bookkeeping.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
I remember some time (years, probably) ago I read on Stackoverflow about the charms of programming with as few if-tests as possible. This question is somewhat relevant but I think the stress was on using many small functions that returned values determined by tests depending on the parameter they receive. A very simple example would be using this:
int i = 5;
bool iIsSmall = isSmall(i);
with isSmall() looking like this:
private bool isSmall(int number)
{
return (i < 10);
}
instead of just doing this:
int i = 5;
bool isSmall;
if (i < 10) {
isSmall = true;
} else {
isSmall = false;
}
(Logically this code is just sample code. It is not part of a program I am making.)
The reason for doing this, I believe, was because it looks nicer and makes a programmer less prone to logical errors. If this coding convention is applied correctly, you would see virtually no if-tests anywhere, except in functions whose only purpose is to do that test.
Now, my question is: is there any documentation about this convention? Is there anyplace where you can see wild arguments between supporters and opposers of this style? I tried searching for the Stackoverflow post that introduced me to this, but I can't find it anymore.
Lastly, I hope this question doesn't get shot down because I am not asking for a solution to a problem. I am simply hoping to hear more about this coding style and maybe increase the quality of all coding I will do in the future.
This whole "if" vs "no if" thing makes me think of the Expression Problem1. Basically, it's an observation that programming with if statements or without if statements is a matter of encapsulation and extensibility and that sometimes it's better to use if statements2 and sometimes it's better to use dynamic dispatching with methods / function pointers.
When we want to model something, there are two axes to worry about:
The different cases (or types) of the inputs we need to deal with.
The different operations we want to perform over these inputs.
One way to implement this sort of thing is with if statements / pattern matching / the visitor pattern:
data List = Nil | Cons Int List
length xs = case xs of
Nil -> 0
Cons a as -> 1 + length x
concat xs ys = case ii of
Nil -> jj
Cons a as -> Cons a (concat as ys)
The other way is to use object orientation:
data List = {
length :: Int
concat :: (List -> List)
}
nil = List {
length = 0,
concat = (\ys -> ys)
}
cons x xs = List {
length = 1 + length xs,
concat = (\ys -> cons x (concat xs ys))
}
It's not hard to see that the first version using if statements makes it easy to add new operations on our data type: just create a new function and do a case analysis inside it. On the other hand, this makes it hard to add new cases to our data type since that would mean going back through the program and modifying all the branching statements.
The second version is kind of the opposite. It's very easy to add new cases to the datatype: just create a new "class" and tell what to do for each of the methods we need to implement. However, it's now hard to add new operations to the interface since this means adding a new method for all the old classes that implemented the interface.
There are many different approaches that languages use to try to solve the Expression Problem and make it easy to add both new cases and new operations to a model. However, there are pros and cons to these solutions3 so in general I think it's a good rule of thumb to choose between OO and if statements depending on what axis you want to make it easier to extend stuff.
Anyway, going back to your question there are couple of things I would like to point out:
The first one is that I think the OO "mantra" of getting rid of all if statements and replacing them with method dispatching has more to do with how most OO languages don't have typesafe Algebraic Data Types than it has to do with "if statemsnts" being bad for encapsulation. Since the only way to be type safe is to use method calls you are encouraged to convert programs using if statements into programs using the Visitor Pattern4 or worse: convert programs that should be using the visitor pattern into programs using simple method dispatch, therefore making extensibility easy in the wrong direction.
The second thing is that I'm not a big fan of breaking things into functions just because you can. In particular, I find that style where all the functions have just 5 lines and call tons of other functions is pretty hard to read.
Finally, I think your example doesn't really get rid of if statements. Essentially, what you are doing is having a function from Integers to a new datatype (with two cases, one for Big and one for Small) and then you still need to use if statements when working with the datatype:
data Size = Big | Small
toSize :: Int -> Size
toSize n = if n < 10 then Small else Big
someOp :: Size -> String
someOp Small = "Wow, its small"
someOp Big = "Wow, its big"
Going back to the expression problem point of view, the advantage of defining our toSize / isSmall function is that we put the logic of choosing what case our number fits in a single place and that our functions can only operate on the case after that. However, this does not mean that we have removed if statements from our code! If we have the toSize being a factory function and we have Big and Small be classes sharing an interface then yes, we will have removed if statements from our code. However, if our isSmall just returns a boolean or enum then there will be just as many if statements as there were before. (and you should choose what implementation to use depending if you want to make it easier to add new methods or new cases - say Medium - in the future)
1 - The name of the problem comes from the problem where you have an "expression" datatype (numbers, variables, addition/multiplication of subexpressions, etc) and want to implement things like evaluation functions and other things.
2 - Or pattern matching over Algebraic Data Types, if you want to be more type safe...
3 - For example, you might have to define all multimethods on the "top level" where the "dispatcher" can see them. This is a limitation compared to the general case since you can use if statements (and lambdas) nested deeply inside other code.
4 - Essentially a "church encoding" of an algebraic data type
I've never heard of such a convection. I don't see how it works, anyway. Surely the only point of having a iIsSmall is to later branch on it (possibly in combination with other values)?
What I have heard of is an argument to avoid having variables like iIsSmall at all. iIsSmall is just storing the result of a test you made, so that you can later use that result to make some decision. So why not just test the value of i at the point where you need to make the decision? i.e., instead of:
int i = 5;
bool iIsSmall = isSmall(i);
...
<code>
...
if (iIsSmall) {
<do something because i is small>
} else {
<do something different because i is not small>
}
just write:
int i = 5
...
<code>
...
if (isSmall(i)) {
<do something because i is small>
} else {
<do something different because i is not small>
}
That way you can tell at the branch point what you're actually branching on because it's right there. That's not hard in this example anyway, but if the test was complicated you're probably not going to be able to encode the whole thing in the variable name.
It's also safer. There's no danger that the name iIsSmall is misleading because you changed the code so that it was testing something else, or because i was actually altered after you called isSmall so that it is not necessarily small anymore, or because someone just picked a dumb variable name, etc, etc.
Obviously this doesn't always work. If the isSmall test is expensive and you need to branch on its result many times, you don't want to execute it many times. You also might not want to duplicate the code of that call many times, unless it's trivial. Or you might want to return the flag to be used by a caller who doesn't know about i (though then you could just return isSmall(i), rather than store it in a variable and then return the variable).
Btw, the separate function saves nothing in your example. You can include (i < 10) in an assignment to a bool variable just as easily as in a return statement in a bool function. i.e. you could just as easily write bool isSmall = i < 10; - it's this that avoids the if statement, not the separate function. Code of the form if (test) { x = true; } else { x = false; } or if (test) { return true; } else { return false; } is always silly; just use x = test or return test.
Is it really a convention? Should one just kill minimal if-constructs just because there could be frustration over it?
OK, if statements tend to grow out of control, especially if many special cases are added over time. Branch after branch is added and at the end no one is able to comprehend what everything does without spending hours of time and some cups of coffee into this grown instance of spaghetti-code.
But is it really a good idea to put everything in seperate functions? Code should be reusable. Code should be readable. But a function call just creates the need to look it up further up in the source file. If all ifs are put away in this way, you just skip around in the source file all the time. Does this support readability?
Or consider an if-statement which is not reused anywhere. Should it really go into a separate function, just for the sake of convention? there is some overhead involved here, too. Performance issues could be relevant in this context, too.
What I am trying to say: following coding conventions is good. Style is important. But there are exceptions. Just try to write good code that fits into your project and keep the future in mind. In the end, coding conventions are just guidelines which try to help us to produce good code without enforcing anything on us.
I was thinking about lists in Haskell, and I thought in other languages, one doesn't use lists for everything. Sure, you might want to store a list if you need the values later on, but if it's just a one off, say iterating from [1..n], why use a list where all that's really needed is a variable that's incremented?
I also read about "list fusion" and noted that whilst Haskell compilers try to implement this optimization to eliminate intermediate lists, they often are unsuccessful, resulting in the garbage collector having to clean up lists which are only used once.
Also, if you're not careful one can easily share a list, which means the garbage collector doesn't clean it up, which can result in running out of memory with an algorithm which was previously design to run in constant space.
So I thought it would be best to avoid lists completely, at least when one doesn't actually want to "store" the list.
I then came across conduit, which says it is:
a solution to the streaming data problem, allowing for production,
transformation, and consumption of streams of data in constant
memory.
This sounded perfect. I know conduit is designed for IO problems with resource acquisition and release issues, but can one just use it as a drop in replacement for lists?
For example, could I do the following:
fold f3 $ take 10 $ map f2 $ unfold f1 init_value
And with a few appropriately placed type annotations, use conduits for the whole process instead of lists?
I was hoping that perhaps classy-prelude would allow such code, but I'm not sure. If it's possible, could someone give an example, say like the above?
List computations stream in constant memory in the same circumstances as they would for conduit. The presence or absence of intermediate data structures does not affect whether or not it runs in constant memory. All it changes is the efficiency and the size of the constant memory that it inhabits.
Do not expect conduit to run in less memory than the equivalent list computation. It should actually take more memory because conduit steps have a greater overhead than list cells. Also, conduit currently does not have stream fusion. Somebody did experiment with that some time ago, although that did not get incorporated into the library. Lists, on the other hand, can and do fuse in many circumstances to remove intermediate data structures.
The important thing to remember is that streaming does not necessarily imply deforestation (i.e. removal of intermediate data structures).
conduit was definitely not designed for this kind of a use case, but it can in theory be used that way. I did so personally for the markdown package, where it was more convenient to have the extra conduit plumbing than to deal directly with lists.
If you put this together with classy-prelude-conduit, you can get some relatively simple code. And we could certainly add more exports to classy-prelude-conduit to better optimize for this use case. For now, here's an example following the basic gist of what you laid out above:
{-# LANGUAGE NoImplicitPrelude #-}
{-# LANGUAGE OverloadedStrings #-}
import ClassyPrelude.Conduit
import Data.Conduit.List (unfold, isolate)
import Data.Functor.Identity (runIdentity)
main = putStrLn
$ runIdentity
$ unfold f1 init_value
$$ map f2
=$ isolate 10
=$ fold f3 ""
f1 :: (Int, Int) -> Maybe (Int, (Int, Int))
f1 (x, y) = Just (x, (y, x + y))
init_value = (1, 1)
f2 :: Int -> Text
f2 = show
f3 :: Text -> Text -> Text
f3 x y = x ++ y ++ "\n"
Haskell is a pure functional language, which means Haskell functions have no side affects. I/O is implemented using monads that represent chunks of I/O computation.
Is it possible to test the return value of Haskell I/O functions?
Let's say we have a simple 'hello world' program:
main :: IO ()
main = putStr "Hello world!"
Is it possible for me to create a test harness that can run main and check that the I/O monad it returns the correct 'value'? Or does the fact that monads are supposed to be opaque blocks of computation prevent me from doing this?
Note, I'm not trying to compare the return values of I/O actions. I want to compare the return value of I/O functions - the I/O monad itself.
Since in Haskell I/O is returned rather than executed, I was hoping to examine the chunk of I/O computation returned by an I/O function and see whether or not it was correct. I thought this could allow I/O functions to be unit tested in a way they cannot in imperative languages where I/O is a side-effect.
The way I would do this would be to create my own IO monad which contained the actions that I wanted to model. The I would run the monadic computations I want to compare within my monad and compare the effects they had.
Let's take an example. Suppose I want to model printing stuff. Then I can model my IO monad like this:
data IO a where
Return :: a -> IO a
Bind :: IO a -> (a -> IO b) -> IO b
PutChar :: Char -> IO ()
instance Monad IO where
return a = Return a
Return a >>= f = f a
Bind m k >>= f = Bind m (k >=> f)
PutChar c >>= f = Bind (PutChar c) f
putChar c = PutChar c
runIO :: IO a -> (a,String)
runIO (Return a) = (a,"")
runIO (Bind m f) = (b,s1++s2)
where (a,s1) = runIO m
(b,s2) = runIO (f a)
runIO (PutChar c) = ((),[c])
Here's how I would compare the effects:
compareIO :: IO a -> IO b -> Bool
compareIO ioA ioB = outA == outB
where ioA = runIO ioA ioB
There are things that this kind of model doesn't handle. Input, for instance, is tricky. But I hope that it will fit your usecase. I should also mention that there are more clever and efficient ways of modelling effects in this way. I've chosen this particular way because I think it's the easiest one to understand.
For more information I can recommend the paper "Beauty in the Beast: A Functional Semantics for the Awkward Squad" which can be found on this page along with some other relevant papers.
Within the IO monad you can test the return values of IO functions. To test return values outside of the IO monad is unsafe: this means it can be done, but only at risk of breaking your program. For experts only.
It is worth noting that in the example you show, the value of main has type IO (), which means "I am an IO action which, when performed, does some I/O and then returns a value of type ()." Type () is pronounced "unit", and there are only two values of this type: the empty tuple (also written () and pronounced "unit") and "bottom", which is Haskell's name for a computation that does not terminate or otherwise goes wrong.
It is worth pointing out that testing return values of IO functions from within the IO monad is perfectly easy and normal, and that the idiomatic way to do it is by using do notation.
You can test some monadic code with QuickCheck 2. It's been a long time since I read the paper, so I don't remember if it applies to IO actions or to what kinds of monadic computations it can be applied. Also, it may be that you find it hard to express your unit tests as QuickCheck properties. Still, as a very satisfied user of QuickCheck, I'll say it's a lot better than doing nothing or than hacking around with unsafePerformIO.
I'm sorry to tell you that you can not do this.
unsafePerformIO basically let's you accomplish this. But I would strongly prefer that you do not use it.
Foreign.unsafePerformIO :: IO a -> a
:/
I like this answer to a similar question on SO and the comments to it. Basically, IO will normally produce some change which may be noticed from the outside world; your testing will need to have to do with whether that change seems correct. (E.g. the correct directory structure was produced etc.)
Basically, this means 'behavioural testing', which in complex cases may be quite a pain. This is part of the reason why you should keep the IO-specific part of your code to a minimum and move as much of the logic as possible to pure (therefore super easily testable) functions.
Then again, you could use an assert function:
actual_assert :: String -> Bool -> IO ()
actual_assert _ True = return ()
actual_assert msg False = error $ "failed assertion: " ++ msg
faux_assert :: String -> Bool -> IO ()
faux_assert _ _ = return ()
assert = if debug_on then actual_assert else faux_assert
(You might want to define debug_on in a separate module constructed just before the build by a build script. Also, this is very likely to be provided in a more polished form by a package on Hackage, if not a standard library... If someone knows of such a tool, please edit this post / comment so I can edit.)
I think GHC will be smart enough to skip any faux assertions it finds entirely, wheras actual assertions will definitely crash your programme upon failure.
This is, IMO, very unlikely to suffice -- you'll still need to do behavioural testing in complex scenarios -- but I guess it could help check that the basic assumptions the code is making are correct.