Conduit, replacement for lists? - list

I was thinking about lists in Haskell, and I thought in other languages, one doesn't use lists for everything. Sure, you might want to store a list if you need the values later on, but if it's just a one off, say iterating from [1..n], why use a list where all that's really needed is a variable that's incremented?
I also read about "list fusion" and noted that whilst Haskell compilers try to implement this optimization to eliminate intermediate lists, they often are unsuccessful, resulting in the garbage collector having to clean up lists which are only used once.
Also, if you're not careful one can easily share a list, which means the garbage collector doesn't clean it up, which can result in running out of memory with an algorithm which was previously design to run in constant space.
So I thought it would be best to avoid lists completely, at least when one doesn't actually want to "store" the list.
I then came across conduit, which says it is:
a solution to the streaming data problem, allowing for production,
transformation, and consumption of streams of data in constant
memory.
This sounded perfect. I know conduit is designed for IO problems with resource acquisition and release issues, but can one just use it as a drop in replacement for lists?
For example, could I do the following:
fold f3 $ take 10 $ map f2 $ unfold f1 init_value
And with a few appropriately placed type annotations, use conduits for the whole process instead of lists?
I was hoping that perhaps classy-prelude would allow such code, but I'm not sure. If it's possible, could someone give an example, say like the above?

List computations stream in constant memory in the same circumstances as they would for conduit. The presence or absence of intermediate data structures does not affect whether or not it runs in constant memory. All it changes is the efficiency and the size of the constant memory that it inhabits.
Do not expect conduit to run in less memory than the equivalent list computation. It should actually take more memory because conduit steps have a greater overhead than list cells. Also, conduit currently does not have stream fusion. Somebody did experiment with that some time ago, although that did not get incorporated into the library. Lists, on the other hand, can and do fuse in many circumstances to remove intermediate data structures.
The important thing to remember is that streaming does not necessarily imply deforestation (i.e. removal of intermediate data structures).

conduit was definitely not designed for this kind of a use case, but it can in theory be used that way. I did so personally for the markdown package, where it was more convenient to have the extra conduit plumbing than to deal directly with lists.
If you put this together with classy-prelude-conduit, you can get some relatively simple code. And we could certainly add more exports to classy-prelude-conduit to better optimize for this use case. For now, here's an example following the basic gist of what you laid out above:
{-# LANGUAGE NoImplicitPrelude #-}
{-# LANGUAGE OverloadedStrings #-}
import ClassyPrelude.Conduit
import Data.Conduit.List (unfold, isolate)
import Data.Functor.Identity (runIdentity)
main = putStrLn
$ runIdentity
$ unfold f1 init_value
$$ map f2
=$ isolate 10
=$ fold f3 ""
f1 :: (Int, Int) -> Maybe (Int, (Int, Int))
f1 (x, y) = Just (x, (y, x + y))
init_value = (1, 1)
f2 :: Int -> Text
f2 = show
f3 :: Text -> Text -> Text
f3 x y = x ++ y ++ "\n"

Related

Is there a function that can make a string representation of any type?

I was desperately looking for the last hour for a method in the OCaml Library which converts an 'a to a string:
'a -> string
Is there something in the library which I just haven't found? Or do I have to do it different (writing everything by my own)?
It is not possible to write a printing function show of type 'a -> string in OCaml.
Indeed, types are erased after compilation in OCaml. (They are in fact erased after the typechecking which is one of the early phase of the compilation pipeline).
Consequently, a function of type 'a -> _ can either:
ignore its argument:
let f _ = "<something>"
peek at the memory representation of a value
let f x = if Obj.is_block x then "<block>" else "<immediate>"
Even peeking at the memory representation of a value has limited utility since many different types will share the same memory representation.
If you want to print a type, you need to create a printer for this type. You can either do this by hand using the Fmt library (or the Format module in the standard library)
type tree = Leaf of int | Node of { left:tree; right: tree }
let pp ppf tree = match tree with
| Leaf d -> Fmt.fp ppf "Leaf %d" d
| Node n -> Fmt.fp ppf "Node { left:%a; right:%a}" pp n.left pp n.right
or by using a ppx (a small preprocessing extension for OCaml) like https://github.com/ocaml-ppx/ppx_deriving.
type tree = Leaf of int | Node of { left:tree; right: tree } [##deriving show]
If you just want a quick hacky solution, you can use dump from theBatteries library. It doesn't work for all cases, but it does work for primitives, lists, etc. It accesses the underlying raw memory representation, hence is able to overcome (to some extent) the difficulties mentioned in the other answers.
You can use it like this (after installing it via opam install batteries):
# #require "batteries";;
# Batteries.dump 1;;
- : string = "1"
# Batteries.dump 1.2;;
- : string = "1.2"
# Batteries.dump [1;2;3];;
- : string = "[1; 2; 3]"
If you want a more "proper" solution, use ppx_deriving as recommended by #octachron. It is much more reliable/maintainable/customizable.
What you are looking for is a meaningful function of type 'a. 'a -> string, with parametric polymorphism (i.e. a single function that can operate the same for all possible types 'a, even those that didn’t exist when the function was created). This is not possible in OCaml. Here are explications depending on your programming background.
Coming from Haskell
If you were expecting such a function because you are familiar with the Haskell function show, then notice that its type is actually show :: Show a => a -> String. It uses an instance of the typeclass Show a, which is implicitly inserted by the compiler at call sites. This is not parametric polymorphism, this is ad-hoc polymorphism (show is overloaded, if you want). There is no such feature in OCaml (yet? there are projects for the future of the language, look for “modular implicits” or “modular explicits”).
Coming from OOP
If you were expecting such a function because you are familiar with OO languages in which every value is an object with a method toString, then this is not the case of OCaml. OCaml does not use the object model pervasively, and run-time representation of OCaml values retains no (or very few) notion of type. I refer you to #octachron’s answer.
Again, toString in OOP is not parametric polymorphism but overloading: there is not a single method toString which is defined for all possible types. Instead there are multiple — possibly very different — implementations of a method of the same name. In some OO languages, programmers try to follow the discipline of implementing a method by that name for every class they define, but it is only a coding practice. One could very well create objects that do not have such a method.
[ Actually, the notions involved in both worlds are pretty similar: Haskell requires an instance of a typeclass Show a providing a function show; OOP requires an object of a class Stringifiable (for instance) providing a method toString. Or, of course, an instance/object of a descendent typeclass/class. ]
Another possibility is to use https://github.com/ocaml-ppx/ppx_deriving with will create the function of Path.To.My.Super.Type.t -> string you can then use with your value. However you still need to track the path of the type by hand but it is better than nothing.
Another project provide feature similar to Batterie https://github.com/reasonml/reason-native/blob/master/src/console/README.md (I haven't tested Batterie so can't give opinion) They have the same limitation: they introspect the runtime encoding so can't get something really useable. I think it was done with windows/browser in mind so if cross plat is required I will test this one before (unless batterie is already pulled). and even if the code source is in reason you can use with same API in OCaml.

Haskell - Why is Alternative implemented for List

I have read some of this post Meaning of Alternative (it's long)
What lead me to that post was learning about Alternative in general. The post gives a good answer to why it is implemented the way it is for List.
My question is:
Why is Alternative implemented for List at all?
Is there perhaps an algorithm that uses Alternative and a List might be passed to it so define it to hold generality?
I thought because Alternative by default defines some and many, that may be part of it but What are some and many useful for contains the comment:
To clarify, the definitions of some and many for the most basic types such as [] and Maybe just loop. So although the definition of some and many for them is valid, it has no meaning.
In the "What are some and many useful for" link above, Will gives an answer to the OP that may contain the answer to my question, but at this point in my Haskelling, the forest is a bit thick to see the trees.
Thanks
There's something of a convention in the Haskell library ecology that if a thing can be an instance of a class, then it should be an instance of the class. I suspect the honest answer to "why is [] an Alternative?" is "because it can be".
...okay, but why does that convention exist? The short answer there is that instances are sort of the one part of Haskell that succumbs only to whole-program analysis. They are global, and if there are two parts of the program that both try to make a particular class/type pairing, that conflict prevents the program from working right. To deal with that, there's a rule of thumb that any instance you write should live in the same module either as the class it's associated with or as the type it's associated with.
Since instances are expected to live in specific modules, it's polite to define those instances whenever you can -- since it's not really reasonable for another library to try to fix up the fact that you haven't provided the instance.
Alternative is useful when viewing [] as the nondeterminism-monad. In that case, <|> represents a choice between two programs and empty represents "no valid choice". This is the same interpretation as for e.g. parsers.
some and many does indeed not make sense for lists, since they try iterating through all possible lists of elements from the given options greedily, starting from the infinite list of just the first option. The list monad isn't lazy enough to do even that, since it might always need to abort if it was given an empty list. There is however one case when both terminates: When given an empty list.
Prelude Control.Applicative> many []
[[]]
Prelude Control.Applicative> some []
[]
If some and many were defined as lazy (in the regex sense), meaning they prefer short lists, you would get out results, but not very useful, since it starts by generating all the infinite number of lists with just the first option:
Prelude Control.Applicative> some' v = liftA2 (:) v (many' v); many' v = pure [] <|> some' v
Prelude Control.Applicative> take 100 . show $ (some' [1,2])
"[[1],[1,1],[1,1,1],[1,1,1,1],[1,1,1,1,1],[1,1,1,1,1,1],[1,1,1,1,1,1,1],[1,1,1,1,1,1,1,1],[1,1,1,1,1,"
Edit: I believe the some and many functions corresponds to a star-semiring while <|> and empty corresponds to plus and zero in a semiring. So mathematically (I think), it would make sense to split those operations out into a separate typeclass, but it would also be kind of silly, since they can be implemented in terms of the other operators in Alternative.
Consider a function like this:
fallback :: Alternative f => a -> (a -> f b) -> (a -> f e) -> f (Either e b)
fallback x f g = (Right <$> f x) <|> (Left <$> g x)
Not spectacularly meaningful, but you can imagine it being used in, say, a parser: try one thing, falling back to another if that doesn't work.
Does this function have a meaning when f ~ []? Sure, why not. If you think of a list's "effects" as being a search through some space, this function seems to represent some kind of biased choice, where you prefer the first option to the second, and while you're willing to try either, you also tag which way you went.
Could a function like this be part of some algorithm which is polymorphic in the Alternative it computes in? Again I don't see why not. It doesn't seem unreasonable for [] to have an Alternative instance, since there is an implementation that satisfies the Alternative laws.
As to the answer linked to by Will Ness that you pointed out: it covers that some and many don't "just loop" for lists. They loop for non-empty lists. For empty lists, they immediately return a value. How useful is this? Probably not very, I must admit. But that functionality comes along with (<|>) and empty, which can be useful.

Should function composition and piping be tested?

In F# (and most of the functional languages) some codes are extremely short as follows:
let f = getNames
>> Observable.flatmap ObservableJson.jsonArrayToObservableObjects<string>
or :
let jsonArrayToObservableObjects<'t> =
JsonConvert.DeserializeObject<'t[]>
>> Observable.ToObservable
And the simplest property-based test I ended up for the latter function is :
testList "ObservableJson" [
testProperty "Should convert an Observable of `json` array to Observable of single F# objects" <| fun _ ->
//--Arrange--
let (array , json) = createAJsonArrayOfString stringArray
//--Act--
let actual = jsonArray
|> ObservableJson.jsonArrayToObservableObjects<string>
|> Observable.ToArray
|> Observable.Wait
//--Assert--
Expect.sequenceEqual actual sArray
]
Regardless of the arrange part, the test is more than the function under test, so it's harder to read than the function under test!
What would be the value of testing when it's harder to read than the production code?
On the other hand:
I wonder whether the functions which are a composition of multiple functions are safe to not to be tested?
Should they be tested at integration and acceptance level?
And what if they are short but do complex operations?
Depends upon your definition of what 'functional programming' is. Or even more precise - upon how close you wanna stay to the origin of functional programming - math with both broad and narrow meanings.
Let's take something relevant to the programming. Say, mappings theory. Your question could be translated in such a way: having a bijection from A to B, and a bijection from B to C, should I prove that composition of those two is a bijection as well? The answer is twofold: you definitely should, and you do it only once: your prove is generic enough to cover all possible cases.
Falling back into programming, it means that pipe-lining has to be tested (proved) only once - and I guess it was before deploy to the production. Since that your job as a programmer to create functions (mappings) of such a quality, that, being composed with a pipeline operator or whatever else, the desired properties are preserved. Once again, it's better to stick with generic arguments rather than write tons of similar tests.
So, finally, we come down to a much more valuable question: how one can guarantee that some operation preserve some property? It turns out that the easiest way to acknowledge such a fact is to deal with types like Monoid from the great Haskell: for example, Monoind is there to represent any associative binary operation A -> A -> A together with some identity-element of type A. Having such a generic containers is extremely profitable and the best known way of being explicit in what and how exactly your code is designed to do.
Personally I would NOT test it.
In fact having less need for testing and instead relying more on stricter compiler rules, side effect free functions, immutability etc. is one major reason why I prefer F# over C#.
of course I continue (unit)testing of "custom logic" ... e.g. algorithmic code

Nice way to keep track of several references between functions in ST monad?

I'm writing some code (a Metropolis-Hastings MCMC sampler) that will use a random number generator, and modify an array and potentially other structures based on this.
My initial idea was to use the ST monad, so that I could use ST arrays and the mersenne-random-pure64 package, keeping the PureMT generator as part of the state.
However I want to be able to split off some of the work into separate helper functions (e.g to sample a random integer in a given range, to update the array structure, and potentially more complicated things). To do this, I think I would need to pass the references to the PureMT gen and the array to all the functions, which could quickly become very ugly if I need to store more state.
My instinct is to group all of the state into a single data type that I can access anywhere, as I would using the State monad by defining a new datatype, but I don't know if that is possible with the ST monad, or the right way to go about it.
Are there any nice patterns for doing this sort of thing? I want to keep things as general as possible because I will probably need to add extra state and build more monadic code around the existing parts.
I have tried looking for examples of ST monad code but it does not seem to be covered in Real World Haskell, and the haskell wiki examples are very short and simple.
thanks!
My instinct is to group all of the state into a single data type that I can access anywhere, as I would using the State monad by defining a new datatype, but I don't know if that is possible with the ST monad, or the right way to go about it.
Are there any nice patterns for doing this sort of thing? I want to keep things as general as possible because I will probably need to add extra state and build more monadic code around the existing parts.
The key point to realize here is that it's completely irrelevant that you're using ST. The ST references themselves are just regular values, which you need access to in a variety of places, but you don't actually want to change them! The mutability occurs in ST, but the STRef values and whatnot are basically read-only. They're names pointing to the mutable data.
Of course, read-only access to an ambient environment is what the Reader monad is for. The ugly passing of references to all the functions is exactly what it's doing for you, but because you're already in ST, you can just bolt it on as a monad transformer. As a simple example, you can do something like this:
newtype STEnv s e a = STEnv (ReaderT e (ST s) a)
deriving (Functor, Applicative, Monad)
runEnv :: STEnv s e a -> ST s e -> ST s a
runEnv (STEnv r) e = runReaderT r =<< e
readSTEnv :: (e -> STRef s a) -> STEnv s e a
readSTEnv f = STEnv $ lift . readSTRef . f =<< ask
writeSTEnv :: (e -> STRef s a) -> a -> STEnv s e ()
writeSTEnv f x = STEnv $ lift . flip writeSTRef x . f =<< ask
For more generality, you could abstract over the details of the reference types, and make it into a general "environment with mutable references" monad.
You can use the ST monad just like the IO monad, bearing in mind that you only get arrays and refs and no other IO goodies. Just like IO, you can layer a StateT over it if you want to thread some state transparently through your computation.

Is FC++ used by any open source projects?

The FC++ library provides an interesting approach to supporting functional programming concepts in C++.
A short example from the FAQ:
take (5, map (odd, enumFrom(1)))
FC++ seems to take a lot of inspiration from Haskell, to the extent of reusing many function names from the Haskell prelude.
I've seen a recent article about it, and it's been briefly mentioned in some answers on stackoverflow, but I can't find any usage of it out in the wild.
Are there any open source projects actively using FC++? Or any history of projects which used it in the past? Or does anyone have personal experience with it?
There's a Customers section on the web site, but the only active link is to another library by the same authors (LC++).
As background: I'm looking to write low latency audio plugins using existing C++ APIs, and I'm looking for tooling which allows me to write concise code in a functional style. For this project I wan't to use a C++ library rather than using a separate language, to avoid introducing FFI bindings (because of the complexity) or garbage collection (to keep the upper bound on latency in the sub-millisecond range).
I'm aware that the STL and Boost libraries already provide support from many FP concepts--this may well be a more practical approach. I'm also aware of other promising approaches for code generation of audio DSP code from functional languages, such as the FAUST project or the Haskell synthesizer package.
This isn't an answer to your question proper, but my experience with embedding of functional style into imperative languages has been horrid. While the code can be almost as concise, it retains the complexity of reasoning found in imperative languages.
The complexity of the embedding usually requires the most intimate knowledge of the details and corner cases of the language. This greatly increases the cost of abstraction, as these things must always be taken into careful consideration. And with a cost of abstraction so high, it is easier just to put a side-effectful function in a lazy stream generator and then die of subtle bugs.
An example from FC++:
struct Insert : public CFunType<int,List<int>,List<int> > {
List<int> operator()( int x, const List<int>& l ) const {
if( null(l) || (x > head(l)) )
return cons( x, l );
else
return cons( head(l), curry2(Insert(),x,tail(l)) );
}
};
struct Isort : public CFunType<List<int>,List<int> > {
List<int> operator()( const List<int>& l ) const {
return foldr( Insert(), List<int>(), l );
}
};
I believe this is trying to express the following Haskell code:
-- transliterated, and generalized
insert :: (Ord a) => a -> [a] -> [a]
insert x [] = [x]
insert x (a:as) | x > a = x:a:as
| otherwise = a:insert x as
isort :: (Ord a) => [a] -> [a]
isort = foldr insert []
I will leave you to judge the complexity of the approach as your program grows.
I consider code generation a much more attractive approach. You can restrict yourself to a miniscule subset of your target language, making it easy to port to a different target language. The cost of abstraction in a honest functional language is nearly zero, since, after all, they were designed for that (just as abstracting over imperative code in an imperative language is fairly cheap).
I'm the primary original developer of FC++, but I haven't worked on it in more than six years. I have not kept up with C++/boost much in that time, so I don't know how FC++ compares now. The new C++ standard (and implementations like VC++) has a bit of stuff like lambda and type inference help that makes some of what is in there moot. Nevertheless, there might be useful bits still, like the lazy list types and the Haskell-like (and similarly named) combinators. So I guess try it and see.
(Since you mentioned real-time, I should mention that the lists use reference counting, so if you 'discard' a long list there may be a non-trivial wait in the destructor as all the cells' ref-counts go to zero. I think typically in streaming scenarios with infinite streams/lists this is a non-issue, since you're typically just tailing into the stream and only deallocating things one node at a time as you stream.)