String concatenation from within a single list - list

Scala is new to me so I'm not sure the best way to go about this.
I need to simply take the strings within a single list and join them.
So, concat(List("a","b","c")) returns abc.
Should I first see how many strings there are in the list, that way I can just loop through and join them all? I feel like that needs to be done first, that way you can use the lists just like an array and do list[1] append list[2] append list[3], etc..
Edit:
Here's my idea, of course with compile errors..
def concat(l: List[String]): String = {
var len = l.length
var i = 0
while (i < len) {
val result = result :: l(i) + " "
}
result
}

How about this, on REPL
List("a","b","c") mkString("")
or in script file
List("a","b","c").mkString("")

Some options to explore for you:
imperative: for-loop; use methods from the List object to determine
loop length or use for-each List item
classical functional: recursive function, one element at the time using
higher-order functions: look at fold.
Given the basic level of the problem, I think you're looking at learning some fundamentals in programming. If the language of choice is Scala, probably the focus is on functional programming, so I'd put effort on solving #2, then solve #1. #3 for extra credits.

This exercise is designed to encourage you to think about the problem from a functional perspective. You have a set of data over which you wish to move, performing a set of identical operations. You've already identified the imperative, looping construct (for). Simple enough. Now, how would you build that into a functional construct, not relying on "stateful" looping?

In functional programming, fold ... is a family of higher-order
functions that iterate an arbitrary function over a data structure in
some order and build up a return value.
http://en.wikipedia.org/wiki/Fold_%28higher-order_function%29
That sounds like something you could use.
As string concatenation is associative (to be exact, it forms a monoid having the empty String as neutral element), the "direction" of the fold doesn't matter (at least if you're not bothered by performance).
Speaking of performance: In real life, it would be a good idea to use a StringBuilder for the intermediate steps, but it's up to you if you want to use it.

A bit longer that mkString but more efficient:
s.foldLeft(new StringBuilder())(_ append _).toString()

I'm just assuming here that you are not only new to Scala, but also new to programming in general. I'm not saying SO is not made for newbies, but I'm sure there are many other places, which are better suited for your needs. For example books...
I'm also assuming that your problem doesn't have to be solved in a functional, imperative or some other way. It just has to be solved as a homework assignment.
So here are the list of things you should consider / ask yourself:
If you want to concat all elements of the list do you really need to know how many there are?
If you think you do, fine, but after having solved this problem using this approach try to fiddle around with your solution a little bit to find out if there is another way.
Appending the elements to a resulting list is a thought in right direction, but think about this: in addition to being object-oriented Scala is also a full-blown functional language. You might not know what this means, but all you need to know for now is this: it is pretty darn good with things like lists (LISP is the most known functional language and it stands for LISt Processing, which has to be an indication of some kind, don't you think? ;)). So maybe there is some magical (maybe even Scala idiomatic) way to accomplish such a concatination without defining the resulting list yourself.

Related

(Scala) Lists that can contain lists as elements

I have been coding for two years now. I can't say I'm an expert.
I have taken a course in functional programming in which we used Common Lisp. I heard a lot of great things about Scala, as a "new" language and wanted to learn it. I read a book for the basics and wanted to rewrite all the code we did in Lisp into Scala. Almost all the code was going through lists and this is where I found a problem. Most of the problems I could solve with recursively going through the list where I set it as List[Any] - for example:
def reverse(thelist: List[Any]):List[Any].....
but as I've found out there isn't a specific way for checking whether the head of the list is a list itself except for .isInstanceOf[List[Any]]
This was OK at first, but now I have a problem. Any isn't very specific, especially with comparing elements. If I wanted to have an equivalent list with, let's say, only Int, I can create a List[Int] which can only take an Int value as an element, none of which can be List[Int] itself. The other way, writing List[List[Int]] has the same problem, but in reverse, because every element has to be a List.
As a solution I've tried setting the original list as List[Either[Int,List[Int]]], but that only created more problems, as now I have to constantly write .isInstanceOf and .asInstanceOf in all of my ifs and recursive calls, which is time-consuming and makes the code harder to understand. But even List[Either[Int,List[Int]]] is a temporary solution, because it only goes one level deep. A list can contain a list that can contain a list... and so on.
Does Scala offer a more elegant solution I am not yet aware of, such as using classes or objects in some way, or a simple elegant solution, or am I stuck with writing this kind of code? To make my question more specific, is there a way in Scala to define a list that can, but doesn't have to contain a list of the same kind as an element?
Scala isn't just Common Lisp with different syntax. Using lists for everything is something specific to Lisp, not something you do in other languages.
In Scala it's not normal to ever use a heterogeneous list — List[Any] — for anything. You certainly can if you want, but it isn't the way Scala code is normally written. It certainly isn't the kind of code you should be writing when you are only just beginning to learn the language.
A list that contains a mixture of numbers and lists isn't really a list — it's a tree. In Scala, we don't represent trees using List at all — we define a proper tree data type. Any introductory Scala text contains examples of this. (See, for example, the expression trees in chapter 15 of Programming in Scala.)
As for your reverse example, in Scala we would normally never write:
def reverse(thelist: List[Any]): List[Any]
rather, we write:
def reverse[T](theList: List[T]): List[T]
which works on List[Any] but also works on more specific types such as List[Int].
If you insist on doing it the other way, you aren't really learning Scala — you're fighting with it. Anytime you think you need Any or List[Any], there is better, more idiomatic, more Scala-like solution.
It's also never normal to use asInstanceOf or isInstanceOf in Scala code. They have long obscure names on purpose — they're not intended to be used except in rare situations.
Instead, use pattern matching. It does the equivalent of isInstanceOf and asInstanceOf for you, but in much more concise and less error-prone way. Again, any introductory Scala text should have good coverage of what pattern matching is and how to use it (e.g. chapter 15 again).

OCaml: Pad List with Zeros using Folding

I've been stuck on this for hours now - wondering if anyone could help me out.
I have two Lists of different lengths, and I want to pad the shorter list with 0's so that the two lists have the same length.
I want to do this using the Folding functions, and NOT using recursion.
Any hints are very appreciated!
This sounds like a homework problem (partly due to the somewhat arbitrary restriction on the allowed solutions). So then it doesn't really help to just write the code for you--it eliminates the whole point of doing homework. However you don't give enough info to help any other way. It would be much easier to help if you showed the best code you've written during the hours you've spent on the problem, and explained why you think it doesn't work. Then people can tell you what might be wrong with the code, and give specific hints.
It's not completely clear what you mean by "the folding functions." You can't use List.fold_left2 or List.fold_right2 to fold over both lists at once, as these assume that the input lists are already the same length. This leaves List.fold_left and List.fold_right (it seems to me).
If you're allowed to make an initial pass to get the lengths of the two lists, you can fold over the shorter list to make a copy with padding added at the end. (Right fold is easiest, though it doesn't work so well for very long lists.)
One problem with this approach is that you'd have to make the padding separately, and this might require recursion (due mostly to limits of OCaml library IMHO). Another approach would be to fold over the longer list while traversing and copying the shorter one. The longer list would function as a measure telling you how much padding to add once the shorter list is exhausted. This would be quite a bit more complex.
If either approach seems worth looking at, you might start by writing a function that uses List.fold_right just to copy a list. This is pretty close to what you want to do (it seems to me).
Finding the length of a list itself is recursive. In any case, here's my crack at the problem. I used tabulate function from SML's List structure. Also, I'm assuming you're familiar with anonymous functions.
fun put_zeroes lst1 lst2 =
let
val zero_nums = List.length(lst2)-List.length(lst1)
val pad_zeroes = List.tabulate(zero_nums,fn x => 0)
val new_lst1 = lst1#pad_zeroes
in
new_lst1
end

Dealing with the surprising lack of ParList in scala.collections.parallel

So scala 2.9 recently turned up in Debian testing, bringing the newfangled parallel collections with it.
Suppose I have some code equivalent to
def expensiveFunction(x:Int):Int = {...}
def process(s:List[Int]):List[Int} = s.map(expensiveFunction)
now from the teeny bit I'd gleaned about parallel collections before the docs actually turned up on my machine, I was expecting to parallelize this just by switching the List to a ParList... but to my surprise, there isn't one! (Just ParVector, ParMap, ParSet...).
As a workround, this (or a one-line equivalent) seems to work well enough:
def process(s:List[Int]):List[Int} = {
val ps=scala.collection.parallel.immutable.ParVector()++s
val pr=ps.map(expensiveFunction)
List()++pr
}
yielding an approximately x3 performance improvement in my test code and achieving massively higher CPU usage (quad core plus hyperthreading i7). But it seems kind of clunky.
My question is a sort of an aggregated:
Why isn't there a ParList ?
Given there isn't a ParList, is there a
better pattern/idiom I should adopt so that
I don't feel like they're missing ?
Am I just "behind the times" using Lists a
lot in my scala programs (like all the Scala books I
bought back in the 2.7 days taught me) and
I should actually be making more use of
Vectors ? (I mean in C++ land
I'd generally need a pretty good reason to use
std::list over std::vector).
Lists are great when you want pattern matching (i.e. case x :: xs) and for efficient prepending/iteration. However, they are not so great when you want fast access-by-index, or splitting into chunks, or joining (i.e. xs ::: ys).
Hence it does not make much sense (to have a parallel List) when you think that this kind of thing (splitting and joining) is exactly what is needed for efficient parallelism. Use:
xs.toIndexedSeq.par
First, let me show you how to make a parallel version of that code:
def expensiveFunction(x:Int):Int = {...}
def process(s:List[Int]):Seq[Int] = s.par.map(expensiveFunction).seq
That will have Scala figure things out for you -- and, by the way, it uses ParVector. If you really want List, call .toList instead of .seq.
As for the questions:
There isn't a ParList because a List is an intrinsically non-parallel data structure, because any operation on it requires traversal.
You should code to traits instead of classes -- Seq, ParSeq and GenSeq, for example. Even performance characteristics of a List are guaranteed by LinearSeq.
All the books before Scala 2.8 did not have the new collections library in mind. In particular, the collections really didn't share a consistent and complete API. Now they do, and you'll gain much by taking advantage of it.
Furthermore, there wasn't a collection like Vector in Scala 2.7 -- an immutable collection with (near) constant indexed access.
A List cannot be easily split into various sub-lists which makes it hard to parallelise. For one, it has O(n) access; also a List cannot strip its tail, so one need to include a length parameter.
I guess, taking a Vector will be the better solution.
Note that Scala’s Vector is different from std::vector. The latter is basically a wrapper around standard array, a contiguous block in memory which needs to be copied every now and then when adding or removing data. Scala’s Vector is a specialised data structure which allows for efficient copying and splitting while keeping the data itself immutable.

Are infinite lists useful for any real world applications?

I've been using haskell for quite a while now, and I've read most of Real World Haskell and Learn You a Haskell. What I want to know is whether there is a point to a language using lazy evaluation, in particular the "advantage" of having infinite lists, is there a task which infinite lists make very easy, or even a task that is only possible with infinite lists?
Here's an utterly trivial but actually day-to-day useful example of where infinite lists specifically come in handy: When you have a list of items that you want to use to initialize some key-value-style data structure, starting with consecutive keys. So, say you have a list of strings and you want to put them into an IntMap counting from 0. Without lazy infinite lists, you'd do something like walk down the input list, keeping a running "next index" counter and building up the IntMap as you go.
With infinite lazy lists, the list itself takes the role of the running counter; just use zip [0..] with your list of items to assign the indices, then IntMap.fromList to construct the final result.
Sure, it's essentially the same thing in both cases. But having lazy infinite lists lets you express the concept much more directly without having to worry about details like the length of the input list or keeping track of an extra counter.
An obvious example is chaining your data processing from input to whatever you want to do with it. E.g., reading a stream of characters into a lazy list, which is processed by a lexer, also producing a lazy list of tokens which are parsed into a lazy AST structure, then compiled and executed. It's like using Unix pipes.
I found it's often easier and cleaner to just define all of a sequence in one place, even if it's infinite, and have the code that uses it just grab what it wants.
take 10 mySequence
takeWhile (<100) mySequence
instead of having numerous similar but not quite the same functions that generate a subset
first10ofMySequence
elementsUnder100ofMySequence
The benefits are greater when different subsections of the same sequence are used in different areas.
Infinite data structures (including lists) give a huge boost to modularity and hence reusability, as explained & illustrated in John Hughes's classic paper Why Functional Programming Matters.
For instance, you can decompose complex code chunks into producer/filter/consumer pieces, each of which is potentially useful elsewhere.
So wherever you see real-world value in code reuse, you'll have an answer to your question.
Basically, lazy lists allow you to delay computation until you need it. This can prove useful when you don't know in advance when to stop, and what to precompute.
A standard example is u_n a sequence of numerical computations converging to some limit. You can ask for the first term such that |u_n - u_{n-1}| < epsilon, the right number of terms is computed for you.
Now, you have two such sequences u_n and v_n, and you want to know the sum of the limits to epsilon accuracy. The algorithm is:
compute u_n until epsilon/2 accuracy
compute v_n until epsilon/2 accuracy
return u_n + v_n
All is done lazily, only the necessary u_n and v_n are computed. You may want less simple examples, eg. computing f(u_n) where you know (ie. know how to compute) f's modulus of continuity.
Sound synthesis - see this paper by Jerzy Karczmarczuk:
http://users.info.unicaen.fr/~karczma/arpap/cleasyn.pdf
Jerzy Karczmarcuk has a number of other papers using infinite lists to model mathematical objects like power series and derivatives.
I've translated the basic sound synthesis code to Haskell - enough for a sine wave unit generator and WAV file IO. The performance was just about adequate to run with GHCi on a 1.5GHz Athalon - as I just wanted to test the concept I never got round to optimizing it.
Infinite/lazy structures permit the idiom of "tying the knot": http://www.haskell.org/haskellwiki/Tying_the_Knot
The canonically simple example of this is the Fibonacci sequence, defined directly as a recurrence relation. (Yes, yes, hold the efficiency complaints/algorithms discussion -- the point is the idiom.): fibs = 1:1:zipwith (+) fibs (tail fibs)
Here's another story. I had some code that only worked with finite streams -- it did some things to create them out to a point, then did a whole bunch of nonsense that involved acting on various bits of the stream dependent on the entire stream prior to that point, merging it with information from another stream, etc. It was pretty nice, but I realized it had a whole bunch of cruft necessary for dealing with boundary conditions, and basically what to do when one stream ran out of stuff. I then realized that conceptually, there was no reason it couldn't work on infinite streams. So I switched to a data type without a nil -- i.e. a genuine stream as opposed to a list, and all the cruft went away. Even though I know I'll never need the data past a certain point, being able to rely on it being there allowed me to safely remove lots of silly logic, and let the mathematical/algorithmic part of my code stand out more clearly.
One of my pragmatic favorites is cycle. cycle [False, True] generates the infinite list [False, True, False, True, False ...]. In particular, xs ! 0 = False, xs ! 1 = True, so this is just says whether or not the index of the element is odd or not. Where does this show up? Lot's of places, but here's one that any web developer ought to be familiar with: making tables that alternate shading from row to row.
The general pattern seen here is that if we want to do some operation on a finite list, rather than having to construct a specific finite list that will “do the thing we want,” we can use an infinite list that will work for all sizes of lists. camcann’s answer is in this vein.

Why is the use of tuples in C++ not more common?

Why does nobody seem to use tuples in C++, either the Boost Tuple Library or the standard library for TR1? I have read a lot of C++ code, and very rarely do I see the use of tuples, but I often see lots of places where tuples would solve many problems (usually returning multiple values from functions).
Tuples allow you to do all kinds of cool things like this:
tie(a,b) = make_tuple(b,a); //swap a and b
That is certainly better than this:
temp=a;
a=b;
b=temp;
Of course you could always do this:
swap(a,b);
But what if you want to rotate three values? You can do this with tuples:
tie(a,b,c) = make_tuple(b,c,a);
Tuples also make it much easier to return multiple variable from a function, which is probably a much more common case than swapping values. Using references to return values is certainly not very elegant.
Are there any big drawbacks to tuples that I'm not thinking of? If not, why are they rarely used? Are they slower? Or is it just that people are not used to them? Is it a good idea to use tuples?
A cynical answer is that many people program in C++, but do not understand and/or use the higher level functionality. Sometimes it is because they are not allowed, but many simply do not try (or even understand).
As a non-boost example: how many folks use functionality found in <algorithm>?
In other words, many C++ programmers are simply C programmers using C++ compilers, and perhaps std::vector and std::list. That is one reason why the use of boost::tuple is not more common.
Because it's not yet standard. Anything non-standard has a much higher hurdle. Pieces of Boost have become popular because programmers were clamoring for them. (hash_map leaps to mind). But while tuple is handy, it's not such an overwhelming and clear win that people bother with it.
The C++ tuple syntax can be quite a bit more verbose than most people would like.
Consider:
typedef boost::tuple<MyClass1,MyClass2,MyClass3> MyTuple;
So if you want to make extensive use of tuples you either get tuple typedefs everywhere or you get annoyingly long type names everywhere. I like tuples. I use them when necessary. But it's usually limited to a couple of situations, like an N-element index or when using multimaps to tie the range iterator pairs. And it's usually in a very limited scope.
It's all very ugly and hacky looking when compared to something like Haskell or Python. When C++0x gets here and we get the 'auto' keyword tuples will begin to look a lot more attractive.
The usefulness of tuples is inversely proportional to the number of keystrokes required to declare, pack, and unpack them.
For me, it's habit, hands down: Tuples don't solve any new problems for me, just a few I can already handle just fine. Swapping values still feels easier the old fashioned way -- and, more importantly, I don't really think about how to swap "better." It's good enough as-is.
Personally, I don't think tuples are a great solution to returning multiple values -- sounds like a job for structs.
But what if you want to rotate three values?
swap(a,b);
swap(b,c); // I knew those permutation theory lectures would come in handy.
OK, so with 4 etc values, eventually the n-tuple becomes less code than n-1 swaps. And with default swap this does 6 assignments instead of the 4 you'd have if you implemented a three-cycle template yourself, although I'd hope the compiler would solve that for simple types.
You can come up with scenarios where swaps are unwieldy or inappropriate, for example:
tie(a,b,c) = make_tuple(b*c,a*c,a*b);
is a bit awkward to unpack.
Point is, though, there are known ways of dealing with the most common situations that tuples are good for, and hence no great urgency to take up tuples. If nothing else, I'm not confident that:
tie(a,b,c) = make_tuple(b,c,a);
doesn't do 6 copies, making it utterly unsuitable for some types (collections being the most obvious). Feel free to persuade me that tuples are a good idea for "large" types, by saying this ain't so :-)
For returning multiple values, tuples are perfect if the values are of incompatible types, but some folks don't like them if it's possible for the caller to get them in the wrong order. Some folks don't like multiple return values at all, and don't want to encourage their use by making them easier. Some folks just prefer named structures for in and out parameters, and probably couldn't be persuaded with a baseball bat to use tuples. No accounting for taste.
As many people pointed out, tuples are just not that useful as other features.
The swapping and rotating gimmicks are just gimmicks. They are utterly confusing to those who have not seen them before, and since it is pretty much everyone, these gimmicks are just poor software engineering practice.
Returning multiple values using tuples is much less self-documenting then the alternatives -- returning named types or using named references. Without this self-documenting, it is easy to confuse the order of the returned values, if they are mutually convertible, and not be any wiser.
Not everyone can use boost, and TR1 isn't widely available yet.
When using C++ on embedded systems, pulling in Boost libraries gets complex. They couple to each other, so library size grows. You return data structures or use parameter passing instead of tuples. When returning tuples in Python the data structure is in the order and type of the returned values its just not explicit.
You rarely see them because well-designed code usually doesn't need them- there are not to many cases in the wild where using an anonymous struct is superior to using a named one.
Since all a tuple really represents is an anonymous struct, most coders in most situations just go with the real thing.
Say we have a function "f" where a tuple return might make sense. As a general rule, such functions are usually complicated enough that they can fail.
If "f" CAN fail, you need a status return- after all, you don't want callers to have to inspect every parameter to detect failure. "f" probably fits into the pattern:
struct ReturnInts ( int y,z; }
bool f(int x, ReturnInts& vals);
int x = 0;
ReturnInts vals;
if(!f(x, vals)) {
..report error..
..error handling/return...
}
That isn't pretty, but look at how ugly the alternative is. Note that I still need a status value, but the code is no more readable and not shorter. It is probably slower too, since I incur the cost of 1 copy with the tuple.
std::tuple<int, int, bool> f(int x);
int x = 0;
std::tuple<int, int, bool> result = f(x); // or "auto result = f(x)"
if(!result.get<2>()) {
... report error, error handling ...
}
Another, significant downside is hidden in here- with "ReturnInts" I can add alter "f"'s return by modifying "ReturnInts" WITHOUT ALTERING "f"'s INTERFACE. The tuple solution does not offer that critical feature, which makes it the inferior answer for any library code.
Certainly tuples can be useful, but as mentioned there's a bit of overhead and a hurdle or two you have to jump through before you can even really use them.
If your program consistently finds places where you need to return multiple values or swap several values, it might be worth it to go the tuple route, but otherwise sometimes it's just easier to do things the classic way.
Generally speaking, not everyone already has Boost installed, and I certainly wouldn't go through the hassle of downloading it and configuring my include directories to work with it just for its tuple facilities. I think you'll find that people already using Boost are more likely to find tuple uses in their programs than non-Boost users, and migrants from other languages (Python comes to mind) are more likely to simply be upset about the lack of tuples in C++ than to explore methods of adding tuple support.
As a data-store std::tuple has the worst characteristics of both a struct and an array; all access is nth position based but one cannot iterate through a tuple using a for loop.
So if the elements in the tuple are conceptually an array, I will use an array and if the elements are not conceptually an array, a struct (which has named elements) is more maintainable. ( a.lastname is more explanatory than std::get<1>(a)).
This leaves the transformation mentioned by the OP as the only viable usecase for tuples.
I have a feeling that many use Boost.Any and Boost.Variant (with some engineering) instead of Boost.Tuple.