List design in functional languages

List design in functional languages - list

I've noticed that in functional languages such as Haskell and OCaml you can do 2 actions with lists. First you can do x:xs where x is an element ans xs is a list and the resulting action is we get a new list where x is appended to the beginning of xs in constant time. Second is x++y where both x and y are lists and the resulting action is we get a new list where y gets appended to the end of x in linear time with respect to the number of elements in x.
Now I'm no expert in how languages are designed and compilers are built, but this seems to me a lot like a simple implementation of a linked list with one pointer to the first item. If I were to implement this data structure in a language like C++ I would find it to be generally trivial to add a pointer to the last element. In this case if these languages were implemented this way (assuming they do use linked lists as described) adding a "pointer" to the last item would make it much more efficient to add items to the end of a list and would allow pattern matching with the last element.
My question is are these data structures really implemented as linked lists, and if so why do they not add a reference to the last element?

Yes, they really are linked lists. But they are immutable. The advantage of immutability is that you don't have to worry about who else has a pointer to the same list. You might choose to write x++y, but somewhere else in the program might be relying on x remaining unchanged.
People who work on compilers for such languages (of whom I am one) don't worry about this cost because there are plenty of other data structures that provide efficient access:
A functional queue represented as two lists provides constant-time access to both ends and amortized constant time for put and get operations.
A more sophisticated data structure like a finger tree can provide several kinds of list access at very low cost.
If you just want constant-time append, John Hughes developed an excellent, simple representation of lists as functions, which provides exactly that. (In the Haskell library they are called DList.)
If you're interested in these sorts of questions you can get good info from Chris Okasaki's book Purely Functional Data Structures and from some of Ralf Hinze's less intimidating papers.

You said:
Second is x++y where both x and y are
lists and the resulting action is y
gets appended to the end of x in
linear time with respect to the number
of elements in x.
This is not really true in a functional language like Haskell; y gets appended to a copy of x, since anything holding onto x is depending on it not changing.
If you're going to copy all of x anyway, holding onto its last node doesn't really gain you anything.

Yes, they are linked lists. In languages like Haskell and OCaml, you don't add items to the end of a list, period. Lists are immutable. There is one operation to create new lists — cons, the : operator you refer to earlier. It takes an element and a list, and creates a new list with the element as the head and the list as the tail. The reason x++y takes linear time is because it must cons the last element of x with y, and then cons the second-to-last element of x with that list, and so on with each element of x. None of the cons cells in x can be reused, because that would cause the original list to change as well. A pointer to the last element of x would not be very helpful here — we still have to walk the whole list.

++ is just one of dozens of "things you can do with lists". The reality is that lists are so versatile that one rarely uses other collections. Also, we functional programmers almost never feel the need to look at the last element of a list - if we need to, there is a function last.
However, just because lists are convenient this does not mean that we do not have other data structures. If you're really interested, have a look at this book http://www.cs.cmu.edu/~rwh/theses/okasaki.pdf (Purely Functional Data Structures). You'll find trees, queues, lists with O(1) append of an element at the tail, and so forth.

Here's a bit of an explanation on how things are done in Clojure:
The easiest way to avoid mutating state is to use immutable data structures. Clojure provides a set of immutable lists, vectors, sets and maps. Since they can't be changed, 'adding' or 'removing' something from an immutable collection means creating a new collection just like the old one but with the needed change. Persistence is a term used to describe the property wherein the old version of the collection is still available after the 'change', and that the collection maintains its performance guarantees for most operations. Specifically, this means that the new version can't be created using a full copy, since that would require linear time. Inevitably, persistent collections are implemented using linked data structures, so that the new versions can share structure with the prior version. Singly-linked lists and trees are the basic functional data structures, to which Clojure adds a hash map, set and vector both based upon array mapped hash tries.
(emphasis mine)
So basically it looks you're mostly correct, at least as far as Clojure is concerned.

Related

OCaml: list # [x] vs x :: list [duplicate]

I've recently started learning scala, and I've come across the :: (cons) function, which prepends to a list.
In the book "Programming in Scala" it states that there is no append function because appending to a list has performance o(n) whereas prepending has a performance of o(1)
Something just strikes me as wrong about that statement.
Isn't performance dependent on implementation? Isn't it possible to simply implement the list with both forward and backward links and store the first and last element in the container?
The second question I suppose is what I'm supposed to do when I have a list, say 1,2,3 and I want to add 4 to the end of it?

The key is that x :: somelist does not mutate somelist, but instead creates a new list, which contains x followed by all elements of somelist. This can be done in O(1) time because you only need to set somelist as the successor of x in the newly created, singly linked list.
If doubly linked lists were used instead, x would also have to be set as the predecessor of somelist's head, which would modify somelist. So if we want to be able to do :: in O(1) without modifying the original list, we can only use singly linked lists.
Regarding the second question: You can use ::: to concatenate a single-element list to the end of your list. This is an O(n) operation.
List(1,2,3) ::: List(4)

Other answers have given good explanations for this phenomenon. If you are appending many items to a list in a subroutine, or if you are creating a list by appending elements, a functional idiom is to build up the list in reverse order, cons'ing the items on the front of the list, then reverse it at the end. This gives you O(n) performance instead of O(n²).

Since the question was just updated, it's worth noting that things have changed here.
In today's Scala, you can simply use xs :+ x to append an item at the end of any sequential collection. (There is also x +: xs to prepend. The mnemonic for many of Scala's 2.8+ collection operations is that the colon goes next to the collection.)
This will be O(n) with the default linked implementation of List or Seq, but if you use Vector or IndexedSeq, this will be effectively constant time. Scala's Vector is probably Scala's most useful list-like collection—unlike Java's Vector which is mostly useless these days.
If you are working in Scala 2.8 or higher, the collections introduction is an absolute must read.

Prepending is faster because it only requires two operations:
Create the new list node
Have that new node point to the existing list
Appending requires more operations because you have to traverse to the end of the list since you only have a pointer to the head.
I've never programmed in Scala before, but you could try a List Buffer

Most functional languages prominently figure a singly-linked-list data structure, as it's a handy immutable collection type. When you say "list" in a functional language, that's typically what you mean (a singly-linked list, usually immutable). For such a type, append is O(n) whereas cons is O(1).

SML/NJ - Effective way or data structure to access from end towards the start

I am making a program, and an algorithm I have thought to use requires a cheap way of accessing a list backwards to be effective. Is there an effective way to access a list from the last element forward? Or, because I think that might be impossible due to the structure of SML lists, is there an effective data structure to achieve it?
The length of data is unknown before executing, and there is no need for other than serial traversing of the data.

I think you want a functional deque. See e.g. Okasaki's paper on the subject. Specifically, Figure 5 shows an implementation of deques.

If using a functional deque seems like overkill and you need to traverse the list in reverse order just once, then solutions that e.g. use List.last and List.take to emulate hd and tl but in reverse order are, as you seem to know, bad because they would make the list traversal quadratic. On the other hand, the built in function rev is very efficient since it is both tail-recursive and linear. If you feed a list to a function that needs to traverse that list in reverse order, an easy solution is to use a let binding using rev to create a local copy of the list in reverse order and then traverse the reversed list in the usual way.

Inconsistencies in Dart ListBase

Is there any particular reason for the inconsistent return types of the functions in Dart's ListBase class?
Some of the functions do what (as a functional programmer) I would expect, that is: List -> (apply function) -> List. These include: take, skip, reversed.
Others do not: thus l.removeLast() returns just the final element of the list; to get the List without the final element, you have to use a cascade: l..removeLast().
Others return a lazy Iterable, which requires further work to retrieve the list: newl = l.map(f).toList().
Some functions operate more like properties l.last, as opposed to functions l.removeLast()
Is there some subtle reason for these choices?

mbmcavoy is right. Dart is an imperative language and many List members modify the list in-place. The most prominent is the operator []=, but sort, shuffle, add, removeLast, etc. fall into the same category.
In addition to these imperative members, List inherits some functional-style members from Iterable: skip, take, where, map, etc. These are lazy and do not modify the List in place. They are backed by the original list. Modifying the backing list, will change the result of iterating over the iterable. List furthermore adds a few lazy members, like reversed.
To avoid confusion, lazy members always return an Iterable and not an object implementing the List interface. Some of the iterables guarantee fast length and index-operators (like take, skip and reversed) and could easily implement the List interface. However, this would inevitably lead to bugs, since they are lazy and backed by the original list.

(Disclaimer: I have not yet used Dart specifically, but hope to soon.)
Dart is not a functional programming language, which may the the source of your confusion.
Methods, such as .removeLast() are intended to change the state of the object they are called upon. The operation performed by l.removeLast() is to modify l so that it no longer contains the last item. You can access the resulting list by simply using l in your next statement.
(Note they are called "methods" rather than "functions", as they are not truly functions in the mathematical sense.)
The choice to return the removed item rather than the remaining list is a convenience. most frequently, the program will need to do something with the removed item (like move it to a different list).
For other methods, the returned data will relate to a common usage scenario, but it isn't always necessary to capture it.

Does appending a list to another list in F# incur copying of underlying objects or just the pointers?

I've always thought that appending a list to another one meant copying the objects from the first list and then pointing to the appended list as described for example here.
However, in this blog post and in its comment, it says that it is only the pointers that are copied and not the underlying objects.
So what is correct?

Drawing from Snowbear's answer, a more accurate image of combining two lists (than the one presented in the first referred article in the question) would be as shown below.
let FIRST = [1;2;3]
let SECOND = [4;5;6]
let COMBINED = FIRST # SECOND

In the functional world, lists are immutable. This means that node sharing is possible because the original lists will never change. Because the first list ends with the empty list, its nodes must be copied in order to point its last node to the second list.
If you mean this statement then the answer is seems to be pretty simple. Author of the first article is talking about list node elements when he says nodes. Node element is not the same as the list item itself. Take a look at the pictures in the first article. There are arrows going from every element to the next node. These arrows are pointers. But integer type (which is put into the list) has no such pointers. There is probably some list node type which wraps those integers and stores the pointers. When author says that nodes must be copies he is talking about these wrappers being copied. The underlying objects (if they were not value types as in this case) would not be cloned, new wrappers will point to the same object as before.

F# lists hold references (not to be confused with F#'s ref) to their elements; list operations copy those references (pointers), but not the elements themselves.
There are two ways you might append items to an existing list, which is why there seems to be a discrepancy between the articles (though they both look to be correct):
Cons operator (::): The cons operator prepends a single item to an F# list, producing a new list. It's very fast (O(1)), since it only needs to call a very simple constructor to produce the new list.
Append operator (#): The append operator appends two F# lists together, producing a new list. It's not as fast (O(n)) because in order for the elements of the combined list to be ordered correctly, it needs to traverse the entire list on the left-hand-side of the operator (so copying can start at the first element of that list). You'll still see this used in production if the list on the left-hand-side is known to be very small, but in general you'll get much better performance from using ::.

Scheme: Constant Access to the End of a List?

In C, you can have a pointer to the first and last element of a singly-linked list, providing constant time access to the end of a list. Thus, appending one list to another can be done in constant time.
As far as I am aware, scheme does not provide this functionality (namely constant access to the end of a list) by default. To be clear, I am not looking for "pointer" functionality. I understand that is non-idiomatic in scheme and (as I suppose) unnecessary.
Could someone either 1) demonstrate the ability to provide a way to append two lists in constant time or 2) assure me that this is already available by default in scheme or racket (e.g., tell me that append is in fact a constant operation if I am wrong to think otherwise)?
EDIT:
I should make myself clearer. I am trying to create an inspectable queue. I want to have a list that I can 1) push onto the front in constant time, 2) pop off the back in constant time, and 3) iterate over using Racket's foldr or something similar (a Lisp right fold).

Standard Lisp lists cannot be appended to in constant time.
However, if you make your own list type, you can do it. Basically, you can use a record type (or just a cons cell)---let's call this the "header"---that holds pointers to the head and tail of the list, and update it each time someone adds to the list.
However, be aware that if you do that, lists are no longer structurally inductive. i.e., a longer list isn't simply an extension of a shorter list, because of the extra "header" involved. Thus, you lose a great part of the simplicity of Lisp algorithms which involve recursing into the cdr of a list at each iteration.
In other words, the lack of easy appending is a tradeoff to enable recursive algorithms to be written much more easily. Most functional programmers will agree that this is the right tradeoff, since appending in a pure-functional sense means that you have to copy every cell in all but the last list---so it's no longer O(1), anyway.
ETA to reflect OP's edit
You can create a queue, but with the opposite behaviour: you add elements to the back, and retrieve elements in the front. If you are willing to work with that, such a data structure is easy to implement in Scheme. (And yes, it's easy to append two such queues in constant time.)
Racket also has a similar queue data structure, but it uses a record type instead of cons cells, because Racket cons cells are immutable. You can convert your queue to a list using queue->list (at O(n) complexity) for times when you need to fold.

You want a FIFO queue. user448810 mentions the standard implementation for a purely-functional FIFO queue.
Your concern about losing the "key advantage of Lisp lists" needs to be unpacked a bit:
You can write combinators for custom data structures in Lisp. If you implement a queue type, you can easily write fold, map, filter and so on for it.
Scheme, however, does lack in the area of providing polymorphic sequence functions that can work on multiple sequence types. You do often end up either (a) converting your data structures back to lists in order to use the rich library of list functions, or (b) implementing your own versions of various of these functions for your custom types.
This is very much a shame, because singly-linked lists, while they are hugely useful for tons of computations, are not a do-all data structure.
But what is worse is that there's a lot of Lisp folk who like to pretend that lists are a "universal datatype" that can and should be used to represent any kind of data. I've programmed Lisp for a living, and oh my god I hate the code that these people produce; I call it "Lisp programmer's disease," and have much too often had to go in and fix a lot of n^2 that uses lists to represent sets or dictionaries to use hash tables or search trees instead. Don't fall into that trap. Use proper data structures for the task at hand. You can always build your own opaque data types using record types and modules in Racket; you make them opaque by exporting the type but not the field accessors for the record type (you export your type's user-facing operations instead).

It sounds like you are looking for a deque, not a list. The standard idiom for a deque is to keep two lists, the front half of the list in normal order and the back half of the list in reverse order, thus giving access to both ends of the deque. If the half of the list that you want to access is empty, reverse the other half and swap the meaning of the two halves. Look here for a fuller explanation and sample code.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js