(Scala) Lists that can contain lists as elements - list

I have been coding for two years now. I can't say I'm an expert.
I have taken a course in functional programming in which we used Common Lisp. I heard a lot of great things about Scala, as a "new" language and wanted to learn it. I read a book for the basics and wanted to rewrite all the code we did in Lisp into Scala. Almost all the code was going through lists and this is where I found a problem. Most of the problems I could solve with recursively going through the list where I set it as List[Any] - for example:
def reverse(thelist: List[Any]):List[Any].....
but as I've found out there isn't a specific way for checking whether the head of the list is a list itself except for .isInstanceOf[List[Any]]
This was OK at first, but now I have a problem. Any isn't very specific, especially with comparing elements. If I wanted to have an equivalent list with, let's say, only Int, I can create a List[Int] which can only take an Int value as an element, none of which can be List[Int] itself. The other way, writing List[List[Int]] has the same problem, but in reverse, because every element has to be a List.
As a solution I've tried setting the original list as List[Either[Int,List[Int]]], but that only created more problems, as now I have to constantly write .isInstanceOf and .asInstanceOf in all of my ifs and recursive calls, which is time-consuming and makes the code harder to understand. But even List[Either[Int,List[Int]]] is a temporary solution, because it only goes one level deep. A list can contain a list that can contain a list... and so on.
Does Scala offer a more elegant solution I am not yet aware of, such as using classes or objects in some way, or a simple elegant solution, or am I stuck with writing this kind of code? To make my question more specific, is there a way in Scala to define a list that can, but doesn't have to contain a list of the same kind as an element?

Scala isn't just Common Lisp with different syntax. Using lists for everything is something specific to Lisp, not something you do in other languages.
In Scala it's not normal to ever use a heterogeneous list — List[Any] — for anything. You certainly can if you want, but it isn't the way Scala code is normally written. It certainly isn't the kind of code you should be writing when you are only just beginning to learn the language.
A list that contains a mixture of numbers and lists isn't really a list — it's a tree. In Scala, we don't represent trees using List at all — we define a proper tree data type. Any introductory Scala text contains examples of this. (See, for example, the expression trees in chapter 15 of Programming in Scala.)
As for your reverse example, in Scala we would normally never write:
def reverse(thelist: List[Any]): List[Any]
rather, we write:
def reverse[T](theList: List[T]): List[T]
which works on List[Any] but also works on more specific types such as List[Int].
If you insist on doing it the other way, you aren't really learning Scala — you're fighting with it. Anytime you think you need Any or List[Any], there is better, more idiomatic, more Scala-like solution.
It's also never normal to use asInstanceOf or isInstanceOf in Scala code. They have long obscure names on purpose — they're not intended to be used except in rare situations.
Instead, use pattern matching. It does the equivalent of isInstanceOf and asInstanceOf for you, but in much more concise and less error-prone way. Again, any introductory Scala text should have good coverage of what pattern matching is and how to use it (e.g. chapter 15 again).

Related

What is list fusion? [duplicate]

What is Haskell's Stream Fusion and how do I use it?
The paper that Logan points to is great, but it's a little difficult. (Just ask my students.) It's also a great deal about 'how stream fusion works' and only a fraction 'what stream fusion is and how you can use it'.
The problem that stream fusion solves is that functional codes as written often allocate intermediate lists, e.g., to create an infinite list of node numbers, you might write
nodenames = map ("n"++) $ map show [1..]
Naive code would allocate an infinite list of integers [1, 2, 3, ...], an infinite list of strings ["1", "2", "3", ...], and eventually an infinite list of names ["n1", "n2", "n3", ...]. That's too much allocation.
What stream fusion does is translate a definition like nodenames into something which uses a recursive function that allocates only what is needed for the result. In general, eliminating allocation of intermediate lists is called deforestation.
To use stream fusion, you need to write non-recursive list functions that use the functions from the stream-fusion library described in GHC ticket 915 (map, foldr, and so on) instead of explicit recursion. This library contains new versions of all the Prelude functions which have been rewritten to exploit stream fusion. Apparently this stuff is slated to make it into the next GHC release (6.12) but is not in the current stable version (6.10). If you want to use the library Porges has a nice simple explanation in his answer.
If you actually want an explanation of how stream fusion works, post another question---but that's much harder.
As far as I am aware, and contrary to what Norman said, stream fusion is not currently implemented in GHC's base (ie. you cannot just use Prelude functions). For more information see GHC ticket 915.
To use stream fusion you need to install the stream-fusion library, import Data.List.Stream (you can also import Control.Monad.Stream) and only use functions from that module rather than the Prelude functions. This means importing the Prelude hiding all the default list functions, and not using [x..y] constructs or list comprehension.
Isn’t it correct, that when GHC in 6.12 uses those new functions by default, that they will also implement [x..y] and list comprehensions in that non-recursive manner? Because the only reason they aren’t right row, is that they are internal and not really written in Haskell, but more like keywords, for speed’s sake and/or because you wouldn’t be able to redefine that syntax.

OCaml: Pad List with Zeros using Folding

I've been stuck on this for hours now - wondering if anyone could help me out.
I have two Lists of different lengths, and I want to pad the shorter list with 0's so that the two lists have the same length.
I want to do this using the Folding functions, and NOT using recursion.
Any hints are very appreciated!
This sounds like a homework problem (partly due to the somewhat arbitrary restriction on the allowed solutions). So then it doesn't really help to just write the code for you--it eliminates the whole point of doing homework. However you don't give enough info to help any other way. It would be much easier to help if you showed the best code you've written during the hours you've spent on the problem, and explained why you think it doesn't work. Then people can tell you what might be wrong with the code, and give specific hints.
It's not completely clear what you mean by "the folding functions." You can't use List.fold_left2 or List.fold_right2 to fold over both lists at once, as these assume that the input lists are already the same length. This leaves List.fold_left and List.fold_right (it seems to me).
If you're allowed to make an initial pass to get the lengths of the two lists, you can fold over the shorter list to make a copy with padding added at the end. (Right fold is easiest, though it doesn't work so well for very long lists.)
One problem with this approach is that you'd have to make the padding separately, and this might require recursion (due mostly to limits of OCaml library IMHO). Another approach would be to fold over the longer list while traversing and copying the shorter one. The longer list would function as a measure telling you how much padding to add once the shorter list is exhausted. This would be quite a bit more complex.
If either approach seems worth looking at, you might start by writing a function that uses List.fold_right just to copy a list. This is pretty close to what you want to do (it seems to me).
Finding the length of a list itself is recursive. In any case, here's my crack at the problem. I used tabulate function from SML's List structure. Also, I'm assuming you're familiar with anonymous functions.
fun put_zeroes lst1 lst2 =
let
val zero_nums = List.length(lst2)-List.length(lst1)
val pad_zeroes = List.tabulate(zero_nums,fn x => 0)
val new_lst1 = lst1#pad_zeroes
in
new_lst1
end

String concatenation from within a single list

Scala is new to me so I'm not sure the best way to go about this.
I need to simply take the strings within a single list and join them.
So, concat(List("a","b","c")) returns abc.
Should I first see how many strings there are in the list, that way I can just loop through and join them all? I feel like that needs to be done first, that way you can use the lists just like an array and do list[1] append list[2] append list[3], etc..
Edit:
Here's my idea, of course with compile errors..
def concat(l: List[String]): String = {
var len = l.length
var i = 0
while (i < len) {
val result = result :: l(i) + " "
}
result
}
How about this, on REPL
List("a","b","c") mkString("")
or in script file
List("a","b","c").mkString("")
Some options to explore for you:
imperative: for-loop; use methods from the List object to determine
loop length or use for-each List item
classical functional: recursive function, one element at the time using
higher-order functions: look at fold.
Given the basic level of the problem, I think you're looking at learning some fundamentals in programming. If the language of choice is Scala, probably the focus is on functional programming, so I'd put effort on solving #2, then solve #1. #3 for extra credits.
This exercise is designed to encourage you to think about the problem from a functional perspective. You have a set of data over which you wish to move, performing a set of identical operations. You've already identified the imperative, looping construct (for). Simple enough. Now, how would you build that into a functional construct, not relying on "stateful" looping?
In functional programming, fold ... is a family of higher-order
functions that iterate an arbitrary function over a data structure in
some order and build up a return value.
http://en.wikipedia.org/wiki/Fold_%28higher-order_function%29
That sounds like something you could use.
As string concatenation is associative (to be exact, it forms a monoid having the empty String as neutral element), the "direction" of the fold doesn't matter (at least if you're not bothered by performance).
Speaking of performance: In real life, it would be a good idea to use a StringBuilder for the intermediate steps, but it's up to you if you want to use it.
A bit longer that mkString but more efficient:
s.foldLeft(new StringBuilder())(_ append _).toString()
I'm just assuming here that you are not only new to Scala, but also new to programming in general. I'm not saying SO is not made for newbies, but I'm sure there are many other places, which are better suited for your needs. For example books...
I'm also assuming that your problem doesn't have to be solved in a functional, imperative or some other way. It just has to be solved as a homework assignment.
So here are the list of things you should consider / ask yourself:
If you want to concat all elements of the list do you really need to know how many there are?
If you think you do, fine, but after having solved this problem using this approach try to fiddle around with your solution a little bit to find out if there is another way.
Appending the elements to a resulting list is a thought in right direction, but think about this: in addition to being object-oriented Scala is also a full-blown functional language. You might not know what this means, but all you need to know for now is this: it is pretty darn good with things like lists (LISP is the most known functional language and it stands for LISt Processing, which has to be an indication of some kind, don't you think? ;)). So maybe there is some magical (maybe even Scala idiomatic) way to accomplish such a concatination without defining the resulting list yourself.

Dealing with the surprising lack of ParList in scala.collections.parallel

So scala 2.9 recently turned up in Debian testing, bringing the newfangled parallel collections with it.
Suppose I have some code equivalent to
def expensiveFunction(x:Int):Int = {...}
def process(s:List[Int]):List[Int} = s.map(expensiveFunction)
now from the teeny bit I'd gleaned about parallel collections before the docs actually turned up on my machine, I was expecting to parallelize this just by switching the List to a ParList... but to my surprise, there isn't one! (Just ParVector, ParMap, ParSet...).
As a workround, this (or a one-line equivalent) seems to work well enough:
def process(s:List[Int]):List[Int} = {
val ps=scala.collection.parallel.immutable.ParVector()++s
val pr=ps.map(expensiveFunction)
List()++pr
}
yielding an approximately x3 performance improvement in my test code and achieving massively higher CPU usage (quad core plus hyperthreading i7). But it seems kind of clunky.
My question is a sort of an aggregated:
Why isn't there a ParList ?
Given there isn't a ParList, is there a
better pattern/idiom I should adopt so that
I don't feel like they're missing ?
Am I just "behind the times" using Lists a
lot in my scala programs (like all the Scala books I
bought back in the 2.7 days taught me) and
I should actually be making more use of
Vectors ? (I mean in C++ land
I'd generally need a pretty good reason to use
std::list over std::vector).
Lists are great when you want pattern matching (i.e. case x :: xs) and for efficient prepending/iteration. However, they are not so great when you want fast access-by-index, or splitting into chunks, or joining (i.e. xs ::: ys).
Hence it does not make much sense (to have a parallel List) when you think that this kind of thing (splitting and joining) is exactly what is needed for efficient parallelism. Use:
xs.toIndexedSeq.par
First, let me show you how to make a parallel version of that code:
def expensiveFunction(x:Int):Int = {...}
def process(s:List[Int]):Seq[Int] = s.par.map(expensiveFunction).seq
That will have Scala figure things out for you -- and, by the way, it uses ParVector. If you really want List, call .toList instead of .seq.
As for the questions:
There isn't a ParList because a List is an intrinsically non-parallel data structure, because any operation on it requires traversal.
You should code to traits instead of classes -- Seq, ParSeq and GenSeq, for example. Even performance characteristics of a List are guaranteed by LinearSeq.
All the books before Scala 2.8 did not have the new collections library in mind. In particular, the collections really didn't share a consistent and complete API. Now they do, and you'll gain much by taking advantage of it.
Furthermore, there wasn't a collection like Vector in Scala 2.7 -- an immutable collection with (near) constant indexed access.
A List cannot be easily split into various sub-lists which makes it hard to parallelise. For one, it has O(n) access; also a List cannot strip its tail, so one need to include a length parameter.
I guess, taking a Vector will be the better solution.
Note that Scala’s Vector is different from std::vector. The latter is basically a wrapper around standard array, a contiguous block in memory which needs to be copied every now and then when adding or removing data. Scala’s Vector is a specialised data structure which allows for efficient copying and splitting while keeping the data itself immutable.

Why is the use of tuples in C++ not more common?

Why does nobody seem to use tuples in C++, either the Boost Tuple Library or the standard library for TR1? I have read a lot of C++ code, and very rarely do I see the use of tuples, but I often see lots of places where tuples would solve many problems (usually returning multiple values from functions).
Tuples allow you to do all kinds of cool things like this:
tie(a,b) = make_tuple(b,a); //swap a and b
That is certainly better than this:
temp=a;
a=b;
b=temp;
Of course you could always do this:
swap(a,b);
But what if you want to rotate three values? You can do this with tuples:
tie(a,b,c) = make_tuple(b,c,a);
Tuples also make it much easier to return multiple variable from a function, which is probably a much more common case than swapping values. Using references to return values is certainly not very elegant.
Are there any big drawbacks to tuples that I'm not thinking of? If not, why are they rarely used? Are they slower? Or is it just that people are not used to them? Is it a good idea to use tuples?
A cynical answer is that many people program in C++, but do not understand and/or use the higher level functionality. Sometimes it is because they are not allowed, but many simply do not try (or even understand).
As a non-boost example: how many folks use functionality found in <algorithm>?
In other words, many C++ programmers are simply C programmers using C++ compilers, and perhaps std::vector and std::list. That is one reason why the use of boost::tuple is not more common.
Because it's not yet standard. Anything non-standard has a much higher hurdle. Pieces of Boost have become popular because programmers were clamoring for them. (hash_map leaps to mind). But while tuple is handy, it's not such an overwhelming and clear win that people bother with it.
The C++ tuple syntax can be quite a bit more verbose than most people would like.
Consider:
typedef boost::tuple<MyClass1,MyClass2,MyClass3> MyTuple;
So if you want to make extensive use of tuples you either get tuple typedefs everywhere or you get annoyingly long type names everywhere. I like tuples. I use them when necessary. But it's usually limited to a couple of situations, like an N-element index or when using multimaps to tie the range iterator pairs. And it's usually in a very limited scope.
It's all very ugly and hacky looking when compared to something like Haskell or Python. When C++0x gets here and we get the 'auto' keyword tuples will begin to look a lot more attractive.
The usefulness of tuples is inversely proportional to the number of keystrokes required to declare, pack, and unpack them.
For me, it's habit, hands down: Tuples don't solve any new problems for me, just a few I can already handle just fine. Swapping values still feels easier the old fashioned way -- and, more importantly, I don't really think about how to swap "better." It's good enough as-is.
Personally, I don't think tuples are a great solution to returning multiple values -- sounds like a job for structs.
But what if you want to rotate three values?
swap(a,b);
swap(b,c); // I knew those permutation theory lectures would come in handy.
OK, so with 4 etc values, eventually the n-tuple becomes less code than n-1 swaps. And with default swap this does 6 assignments instead of the 4 you'd have if you implemented a three-cycle template yourself, although I'd hope the compiler would solve that for simple types.
You can come up with scenarios where swaps are unwieldy or inappropriate, for example:
tie(a,b,c) = make_tuple(b*c,a*c,a*b);
is a bit awkward to unpack.
Point is, though, there are known ways of dealing with the most common situations that tuples are good for, and hence no great urgency to take up tuples. If nothing else, I'm not confident that:
tie(a,b,c) = make_tuple(b,c,a);
doesn't do 6 copies, making it utterly unsuitable for some types (collections being the most obvious). Feel free to persuade me that tuples are a good idea for "large" types, by saying this ain't so :-)
For returning multiple values, tuples are perfect if the values are of incompatible types, but some folks don't like them if it's possible for the caller to get them in the wrong order. Some folks don't like multiple return values at all, and don't want to encourage their use by making them easier. Some folks just prefer named structures for in and out parameters, and probably couldn't be persuaded with a baseball bat to use tuples. No accounting for taste.
As many people pointed out, tuples are just not that useful as other features.
The swapping and rotating gimmicks are just gimmicks. They are utterly confusing to those who have not seen them before, and since it is pretty much everyone, these gimmicks are just poor software engineering practice.
Returning multiple values using tuples is much less self-documenting then the alternatives -- returning named types or using named references. Without this self-documenting, it is easy to confuse the order of the returned values, if they are mutually convertible, and not be any wiser.
Not everyone can use boost, and TR1 isn't widely available yet.
When using C++ on embedded systems, pulling in Boost libraries gets complex. They couple to each other, so library size grows. You return data structures or use parameter passing instead of tuples. When returning tuples in Python the data structure is in the order and type of the returned values its just not explicit.
You rarely see them because well-designed code usually doesn't need them- there are not to many cases in the wild where using an anonymous struct is superior to using a named one.
Since all a tuple really represents is an anonymous struct, most coders in most situations just go with the real thing.
Say we have a function "f" where a tuple return might make sense. As a general rule, such functions are usually complicated enough that they can fail.
If "f" CAN fail, you need a status return- after all, you don't want callers to have to inspect every parameter to detect failure. "f" probably fits into the pattern:
struct ReturnInts ( int y,z; }
bool f(int x, ReturnInts& vals);
int x = 0;
ReturnInts vals;
if(!f(x, vals)) {
..report error..
..error handling/return...
}
That isn't pretty, but look at how ugly the alternative is. Note that I still need a status value, but the code is no more readable and not shorter. It is probably slower too, since I incur the cost of 1 copy with the tuple.
std::tuple<int, int, bool> f(int x);
int x = 0;
std::tuple<int, int, bool> result = f(x); // or "auto result = f(x)"
if(!result.get<2>()) {
... report error, error handling ...
}
Another, significant downside is hidden in here- with "ReturnInts" I can add alter "f"'s return by modifying "ReturnInts" WITHOUT ALTERING "f"'s INTERFACE. The tuple solution does not offer that critical feature, which makes it the inferior answer for any library code.
Certainly tuples can be useful, but as mentioned there's a bit of overhead and a hurdle or two you have to jump through before you can even really use them.
If your program consistently finds places where you need to return multiple values or swap several values, it might be worth it to go the tuple route, but otherwise sometimes it's just easier to do things the classic way.
Generally speaking, not everyone already has Boost installed, and I certainly wouldn't go through the hassle of downloading it and configuring my include directories to work with it just for its tuple facilities. I think you'll find that people already using Boost are more likely to find tuple uses in their programs than non-Boost users, and migrants from other languages (Python comes to mind) are more likely to simply be upset about the lack of tuples in C++ than to explore methods of adding tuple support.
As a data-store std::tuple has the worst characteristics of both a struct and an array; all access is nth position based but one cannot iterate through a tuple using a for loop.
So if the elements in the tuple are conceptually an array, I will use an array and if the elements are not conceptually an array, a struct (which has named elements) is more maintainable. ( a.lastname is more explanatory than std::get<1>(a)).
This leaves the transformation mentioned by the OP as the only viable usecase for tuples.
I have a feeling that many use Boost.Any and Boost.Variant (with some engineering) instead of Boost.Tuple.