What is list fusion? [duplicate] - list

What is Haskell's Stream Fusion and how do I use it?

The paper that Logan points to is great, but it's a little difficult. (Just ask my students.) It's also a great deal about 'how stream fusion works' and only a fraction 'what stream fusion is and how you can use it'.
The problem that stream fusion solves is that functional codes as written often allocate intermediate lists, e.g., to create an infinite list of node numbers, you might write
nodenames = map ("n"++) $ map show [1..]
Naive code would allocate an infinite list of integers [1, 2, 3, ...], an infinite list of strings ["1", "2", "3", ...], and eventually an infinite list of names ["n1", "n2", "n3", ...]. That's too much allocation.
What stream fusion does is translate a definition like nodenames into something which uses a recursive function that allocates only what is needed for the result. In general, eliminating allocation of intermediate lists is called deforestation.
To use stream fusion, you need to write non-recursive list functions that use the functions from the stream-fusion library described in GHC ticket 915 (map, foldr, and so on) instead of explicit recursion. This library contains new versions of all the Prelude functions which have been rewritten to exploit stream fusion. Apparently this stuff is slated to make it into the next GHC release (6.12) but is not in the current stable version (6.10). If you want to use the library Porges has a nice simple explanation in his answer.
If you actually want an explanation of how stream fusion works, post another question---but that's much harder.

As far as I am aware, and contrary to what Norman said, stream fusion is not currently implemented in GHC's base (ie. you cannot just use Prelude functions). For more information see GHC ticket 915.
To use stream fusion you need to install the stream-fusion library, import Data.List.Stream (you can also import Control.Monad.Stream) and only use functions from that module rather than the Prelude functions. This means importing the Prelude hiding all the default list functions, and not using [x..y] constructs or list comprehension.

Isn’t it correct, that when GHC in 6.12 uses those new functions by default, that they will also implement [x..y] and list comprehensions in that non-recursive manner? Because the only reason they aren’t right row, is that they are internal and not really written in Haskell, but more like keywords, for speed’s sake and/or because you wouldn’t be able to redefine that syntax.

Related

(Scala) Lists that can contain lists as elements

I have been coding for two years now. I can't say I'm an expert.
I have taken a course in functional programming in which we used Common Lisp. I heard a lot of great things about Scala, as a "new" language and wanted to learn it. I read a book for the basics and wanted to rewrite all the code we did in Lisp into Scala. Almost all the code was going through lists and this is where I found a problem. Most of the problems I could solve with recursively going through the list where I set it as List[Any] - for example:
def reverse(thelist: List[Any]):List[Any].....
but as I've found out there isn't a specific way for checking whether the head of the list is a list itself except for .isInstanceOf[List[Any]]
This was OK at first, but now I have a problem. Any isn't very specific, especially with comparing elements. If I wanted to have an equivalent list with, let's say, only Int, I can create a List[Int] which can only take an Int value as an element, none of which can be List[Int] itself. The other way, writing List[List[Int]] has the same problem, but in reverse, because every element has to be a List.
As a solution I've tried setting the original list as List[Either[Int,List[Int]]], but that only created more problems, as now I have to constantly write .isInstanceOf and .asInstanceOf in all of my ifs and recursive calls, which is time-consuming and makes the code harder to understand. But even List[Either[Int,List[Int]]] is a temporary solution, because it only goes one level deep. A list can contain a list that can contain a list... and so on.
Does Scala offer a more elegant solution I am not yet aware of, such as using classes or objects in some way, or a simple elegant solution, or am I stuck with writing this kind of code? To make my question more specific, is there a way in Scala to define a list that can, but doesn't have to contain a list of the same kind as an element?
Scala isn't just Common Lisp with different syntax. Using lists for everything is something specific to Lisp, not something you do in other languages.
In Scala it's not normal to ever use a heterogeneous list — List[Any] — for anything. You certainly can if you want, but it isn't the way Scala code is normally written. It certainly isn't the kind of code you should be writing when you are only just beginning to learn the language.
A list that contains a mixture of numbers and lists isn't really a list — it's a tree. In Scala, we don't represent trees using List at all — we define a proper tree data type. Any introductory Scala text contains examples of this. (See, for example, the expression trees in chapter 15 of Programming in Scala.)
As for your reverse example, in Scala we would normally never write:
def reverse(thelist: List[Any]): List[Any]
rather, we write:
def reverse[T](theList: List[T]): List[T]
which works on List[Any] but also works on more specific types such as List[Int].
If you insist on doing it the other way, you aren't really learning Scala — you're fighting with it. Anytime you think you need Any or List[Any], there is better, more idiomatic, more Scala-like solution.
It's also never normal to use asInstanceOf or isInstanceOf in Scala code. They have long obscure names on purpose — they're not intended to be used except in rare situations.
Instead, use pattern matching. It does the equivalent of isInstanceOf and asInstanceOf for you, but in much more concise and less error-prone way. Again, any introductory Scala text should have good coverage of what pattern matching is and how to use it (e.g. chapter 15 again).

Finding a path between two lists

I've been tasked with the portion of the code that reads through two lists and finds a path between the two lists.
My problem is however with reading and returning a TRUE case if I find the required node in the first list. I've been writing my code as below.
%declare the list
circle_line([cc1,cc2,cc3,cc4,cc5,cc6,cc7,cc8,cc9,cc10,cc11,cc12,cc13,cc14,cc15,cc16]).
%the predicate that finds the station i need
check_for_station(A) :- circle_line(X),
member(A,X),
write(A),nl.
Then in the cosole I type: check_for_station(cc9).
But the answer I get is "no."
I have a feeling that I'm declaring the list wrong, because in the debugger the value of X comes out to be "H135" or something and it clearly does not loop through every element to find the one that I need.
As whythehack found, in Amzi! Prolog the classic (and nondeterministic) member/2 predicate is not a "built-in" but is provided (along with other useful list manipulation predicates like length/2) in the list library. See here for some context about the built-in versus library predicates for lists.
If check_for_station/1 is only to be called with its argument A bound (as the example suggests), then Amzi!'s built-in (and deterministic) predicate is_member/2 will do the job (and a bit more quickly).
The thinking seems to be that nondeterministic member/2 is a two-line user-definition, and providing a faster deterministic version (not permitting backtracking over members of a list) is something the user could not easily provide for themselves.

Dealing with the surprising lack of ParList in scala.collections.parallel

So scala 2.9 recently turned up in Debian testing, bringing the newfangled parallel collections with it.
Suppose I have some code equivalent to
def expensiveFunction(x:Int):Int = {...}
def process(s:List[Int]):List[Int} = s.map(expensiveFunction)
now from the teeny bit I'd gleaned about parallel collections before the docs actually turned up on my machine, I was expecting to parallelize this just by switching the List to a ParList... but to my surprise, there isn't one! (Just ParVector, ParMap, ParSet...).
As a workround, this (or a one-line equivalent) seems to work well enough:
def process(s:List[Int]):List[Int} = {
val ps=scala.collection.parallel.immutable.ParVector()++s
val pr=ps.map(expensiveFunction)
List()++pr
}
yielding an approximately x3 performance improvement in my test code and achieving massively higher CPU usage (quad core plus hyperthreading i7). But it seems kind of clunky.
My question is a sort of an aggregated:
Why isn't there a ParList ?
Given there isn't a ParList, is there a
better pattern/idiom I should adopt so that
I don't feel like they're missing ?
Am I just "behind the times" using Lists a
lot in my scala programs (like all the Scala books I
bought back in the 2.7 days taught me) and
I should actually be making more use of
Vectors ? (I mean in C++ land
I'd generally need a pretty good reason to use
std::list over std::vector).
Lists are great when you want pattern matching (i.e. case x :: xs) and for efficient prepending/iteration. However, they are not so great when you want fast access-by-index, or splitting into chunks, or joining (i.e. xs ::: ys).
Hence it does not make much sense (to have a parallel List) when you think that this kind of thing (splitting and joining) is exactly what is needed for efficient parallelism. Use:
xs.toIndexedSeq.par
First, let me show you how to make a parallel version of that code:
def expensiveFunction(x:Int):Int = {...}
def process(s:List[Int]):Seq[Int] = s.par.map(expensiveFunction).seq
That will have Scala figure things out for you -- and, by the way, it uses ParVector. If you really want List, call .toList instead of .seq.
As for the questions:
There isn't a ParList because a List is an intrinsically non-parallel data structure, because any operation on it requires traversal.
You should code to traits instead of classes -- Seq, ParSeq and GenSeq, for example. Even performance characteristics of a List are guaranteed by LinearSeq.
All the books before Scala 2.8 did not have the new collections library in mind. In particular, the collections really didn't share a consistent and complete API. Now they do, and you'll gain much by taking advantage of it.
Furthermore, there wasn't a collection like Vector in Scala 2.7 -- an immutable collection with (near) constant indexed access.
A List cannot be easily split into various sub-lists which makes it hard to parallelise. For one, it has O(n) access; also a List cannot strip its tail, so one need to include a length parameter.
I guess, taking a Vector will be the better solution.
Note that Scala’s Vector is different from std::vector. The latter is basically a wrapper around standard array, a contiguous block in memory which needs to be copied every now and then when adding or removing data. Scala’s Vector is a specialised data structure which allows for efficient copying and splitting while keeping the data itself immutable.

D naming conventions: How is Phobos organized?

I'm making my own little library of handy functions and I'm trying to follow Phobos's naming convention but I'm getting really confused. How do I know where things would fit?
Example:
If there was a function like foldRight in Phobos (basically reduce in the reverse direction), which module would I find it in?
I can think of several:
std.algorithm: Because it's expressing an algorithm
std.array: Because I'm likely going to use it on arrays
std.container: Because it's used on containers, rather than single objects
std.functional: Because it's used mainly in functional programming
std.range: Because it operates on ranges as well
but I have no idea which one would be a good choice -- I could give a convincing argument for at least 3 of them.
What's the convention?
std.algorithm: yep and you can implement it like reduce!fun(retro(r))
this module specifies algorithms that run on sequences
std.array: no because it can also run on other ranges
these are helper functions that run only on build-in arrays
std.container: no because it doesn't define a data structure (like a treeset)
this defines data structures that are not build in into the language (for now a linked list, a binary tree and a deterministic array in terms of memory management)
std.functional: no because it doesn't operate on a function but on a range
this one takes a function and returns a different one
std.range: no because it doesn't define a range or provide a different way to iterate over one
the lack of a clear structure is one of my gripes with the phobos library TBH but really reading the first paragraph of the docs should tell you quite a bit of where to put the function

What is Haskell's Stream Fusion

What is Haskell's Stream Fusion and how do I use it?
The paper that Logan points to is great, but it's a little difficult. (Just ask my students.) It's also a great deal about 'how stream fusion works' and only a fraction 'what stream fusion is and how you can use it'.
The problem that stream fusion solves is that functional codes as written often allocate intermediate lists, e.g., to create an infinite list of node numbers, you might write
nodenames = map ("n"++) $ map show [1..]
Naive code would allocate an infinite list of integers [1, 2, 3, ...], an infinite list of strings ["1", "2", "3", ...], and eventually an infinite list of names ["n1", "n2", "n3", ...]. That's too much allocation.
What stream fusion does is translate a definition like nodenames into something which uses a recursive function that allocates only what is needed for the result. In general, eliminating allocation of intermediate lists is called deforestation.
To use stream fusion, you need to write non-recursive list functions that use the functions from the stream-fusion library described in GHC ticket 915 (map, foldr, and so on) instead of explicit recursion. This library contains new versions of all the Prelude functions which have been rewritten to exploit stream fusion. Apparently this stuff is slated to make it into the next GHC release (6.12) but is not in the current stable version (6.10). If you want to use the library Porges has a nice simple explanation in his answer.
If you actually want an explanation of how stream fusion works, post another question---but that's much harder.
As far as I am aware, and contrary to what Norman said, stream fusion is not currently implemented in GHC's base (ie. you cannot just use Prelude functions). For more information see GHC ticket 915.
To use stream fusion you need to install the stream-fusion library, import Data.List.Stream (you can also import Control.Monad.Stream) and only use functions from that module rather than the Prelude functions. This means importing the Prelude hiding all the default list functions, and not using [x..y] constructs or list comprehension.
Isn’t it correct, that when GHC in 6.12 uses those new functions by default, that they will also implement [x..y] and list comprehensions in that non-recursive manner? Because the only reason they aren’t right row, is that they are internal and not really written in Haskell, but more like keywords, for speed’s sake and/or because you wouldn’t be able to redefine that syntax.