Is there any particular reason for the inconsistent return types of the functions in Dart's ListBase class?
Some of the functions do what (as a functional programmer) I would expect, that is: List -> (apply function) -> List. These include: take, skip, reversed.
Others do not: thus l.removeLast() returns just the final element of the list; to get the List without the final element, you have to use a cascade: l..removeLast().
Others return a lazy Iterable, which requires further work to retrieve the list: newl = l.map(f).toList().
Some functions operate more like properties l.last, as opposed to functions l.removeLast()
Is there some subtle reason for these choices?
mbmcavoy is right. Dart is an imperative language and many List members modify the list in-place. The most prominent is the operator []=, but sort, shuffle, add, removeLast, etc. fall into the same category.
In addition to these imperative members, List inherits some functional-style members from Iterable: skip, take, where, map, etc. These are lazy and do not modify the List in place. They are backed by the original list. Modifying the backing list, will change the result of iterating over the iterable. List furthermore adds a few lazy members, like reversed.
To avoid confusion, lazy members always return an Iterable and not an object implementing the List interface. Some of the iterables guarantee fast length and index-operators (like take, skip and reversed) and could easily implement the List interface. However, this would inevitably lead to bugs, since they are lazy and backed by the original list.
(Disclaimer: I have not yet used Dart specifically, but hope to soon.)
Dart is not a functional programming language, which may the the source of your confusion.
Methods, such as .removeLast() are intended to change the state of the object they are called upon. The operation performed by l.removeLast() is to modify l so that it no longer contains the last item. You can access the resulting list by simply using l in your next statement.
(Note they are called "methods" rather than "functions", as they are not truly functions in the mathematical sense.)
The choice to return the removed item rather than the remaining list is a convenience. most frequently, the program will need to do something with the removed item (like move it to a different list).
For other methods, the returned data will relate to a common usage scenario, but it isn't always necessary to capture it.
Related
What (if any) are the rules for deciding the order of the parameters functions in Clojure core?
Functions like map and filter expect a data structure as the last
argument.
Functions like assoc and select-keys expect a data
structure as the first argument.
Functions like map and filter expect a function as the first
argument.
Functions like update-in expect a function as the last argument.
This can cause pains when using the threading macros (I know I can use as-> ) so what is the reasoning behind these decisions? It would also be nice to know so my functions can conform as closely as possible to those written by the great man.
Functions that operate on collections (and so take and return data structures, e.g. conj, merge, assoc, get) take the collection first.
Functions that operate on sequences (and therefore take and return an abstraction over data structures, e.g. map, filter) take the sequence last.
Becoming more aware of the distinction [between collection functions and sequence functions] and when those transitions occur is one of the more subtle aspects of learning Clojure.
(Alex Miller, in this mailing list thread)
This is important part of working intelligently with Clojure's sequence API. Notice, for instance, that they occupy separate sections in the Clojure Cheatsheet. This is not a minor detail. This is central to how the functions are organized and how they should be used.
It may be useful to review this description of the mental model when distinguishing these two kinds of functions:
I am usually very aware in Clojure of when I am working with concrete
collections or with sequences. In many cases I find the flow of data
starts with collections, then moves into sequences (as a result of
applying sequence functions), and then sometimes back to collections
when it comes to rest (via into, vec, or set). Transducers have
changed this a bit as they allow you to separate the target collection
from the transformation and thus it's much easier to stay in
collections all the time (if you want to) by apply into with a
transducer.
When I am building up or working on collections, typically the code
constructing it is "close" and the collection types are known and
obvious. Generally sequential data is far more likely to be vectors
and conj will suffice.
When I am thinking in "sequences", it's very rare for me to do an
operation like "add last" - instead I am thinking in whole collection
terms.
If I do need to do something like that, then I would probably convert
back to collections (via into or vec) and use conj again.
Clojure's FAQ has a few good rules of thumb and visualization techniques for getting an intuition of collection/first-arg versus sequence/last-arg.
Rather than have this be a link-only question, I'll paste a quote of Rich Hickey's response to the Usenet question "Argument order rules of thumb":
One way to think about sequences is that they are read from the left,
and fed from the right:
<- [1 2 3 4]
Most of the sequence functions consume and produce sequences. So one
way to visualize that is as a chain:
map<- filter<-[1 2 3 4]
and one way to think about many of the seq functions is that they are
parameterized in some way:
(map f)<-(filter pred)<-[1 2 3 4]
So, sequence functions take their source(s) last, and any other
parameters before them, and partial allows for direct parameterization
as above. There is a tradition of this in functional languages and
Lisps.
Note that this is not the same as taking the primary operand last.
Some sequence functions have more than one source (concat,
interleave). When sequence functions are variadic, it is usually in
their sources.
I don't think variable arg lists should be a criteria for where the
primary operand goes. Yes, they must come last, but as the evolution
of assoc/dissoc shows, sometimes variable args are added later.
Ditto partial. Every library eventually ends up with a more order-
independent partial binding method. For Clojure, it's #().
What then is the general rule?
Primary collection operands come first.That way one can write -> and
its ilk, and their position is independent of whether or not they have
variable arity parameters. There is a tradition of this in OO
languages and CL (CL's slot-value, aref, elt - in fact the one that
trips me up most often in CL is gethash, which is inconsistent with
those).
So, in the end there are 2 rules, but it's not a free-for-all.
Sequence functions take their sources last and collection functions
take their primary operand (collection) first. Not that there aren't
are a few kinks here and there that I need to iron out (e.g. set/
select).
I hope that helps make it seem less spurious,
Rich
Now, how one distinguishes between a "sequence function" and a "collection function" is not obvious to me. Perhaps others can explain this.
At SO, I have seen questions that compare Array with Seq, List with Seq and Vector with well, everything. I do not understand one thing though. When should I actually use a Seq over any of these? I understand when to use a List, when to use an Array and when to use a Vector. But when is it a good idea to use Seq rather than any of the above listed collections? Why should I use a trait that extends Iterable rather than all the concrete classes listed above?
You usually should use Seq as input parameter for method or class, defined for sequences in general (just general, not necessarily with generic):
def mySort[T](seq: Seq[T]) = ...
case class Wrapper[T](seq: Seq[T])
implicit class RichSeq[T](seq: Seq[T]) { def mySort = ...}
So now you can pass any sequence (like Vector or List) to mySort.
If you care about algorithmic complexity - you can specialize it to IndexedSeq (quick random element access) or LinearSeq (fast memory allocation). Anyway, you should prefer most top-level class if you want your function to be more polymorphic has on its input parameter, as Seq is a common interface for all sequences. If you need something even more generic - you may use Traversable or Iterable.
The principal here is the same as in a number of languages (E.g. in Java should often use List instead of ArrayList, or Map instead of HashMap). If you can deal with the more abstract concept of a Seq, you should, especially when they are parameters to methods.
2 main reasons that come to mind:
1) reuse of your code. e.g. if you have a method that takes a foo(s:Seq), it can be reused for lists and arrays.
2) the ability to change your mind easily. E.g. If you decide that List is working well, but suddenly you realise you need random access, and want to change it to an Array, if you have been defining List everywhere, you'll be forced to change it everywhere.
Note #1: there are times where you could say Iterable over Seq, if your method supports it, in which case I'd inclined to be as abstract as possible.
Note #2: Sometimes, I might be inclined to not say Seq (or be totally abstract) in my work libraries, even if I could. E.g. if I were to do something which would be highly non-performant with the wrong collection. Such as doing Random Access - even if I could write my code to work with a List, it would result in major inefficiency.
I've always thought that appending a list to another one meant copying the objects from the first list and then pointing to the appended list as described for example here.
However, in this blog post and in its comment, it says that it is only the pointers that are copied and not the underlying objects.
So what is correct?
Drawing from Snowbear's answer, a more accurate image of combining two lists (than the one presented in the first referred article in the question) would be as shown below.
let FIRST = [1;2;3]
let SECOND = [4;5;6]
let COMBINED = FIRST # SECOND
In the functional world, lists are immutable. This means that node sharing is possible because the original lists will never change. Because the first list ends with the empty list, its nodes must be copied in order to point its last node to the second list.
If you mean this statement then the answer is seems to be pretty simple. Author of the first article is talking about list node elements when he says nodes. Node element is not the same as the list item itself. Take a look at the pictures in the first article. There are arrows going from every element to the next node. These arrows are pointers. But integer type (which is put into the list) has no such pointers. There is probably some list node type which wraps those integers and stores the pointers. When author says that nodes must be copies he is talking about these wrappers being copied. The underlying objects (if they were not value types as in this case) would not be cloned, new wrappers will point to the same object as before.
F# lists hold references (not to be confused with F#'s ref) to their elements; list operations copy those references (pointers), but not the elements themselves.
There are two ways you might append items to an existing list, which is why there seems to be a discrepancy between the articles (though they both look to be correct):
Cons operator (::): The cons operator prepends a single item to an F# list, producing a new list. It's very fast (O(1)), since it only needs to call a very simple constructor to produce the new list.
Append operator (#): The append operator appends two F# lists together, producing a new list. It's not as fast (O(n)) because in order for the elements of the combined list to be ordered correctly, it needs to traverse the entire list on the left-hand-side of the operator (so copying can start at the first element of that list). You'll still see this used in production if the list on the left-hand-side is known to be very small, but in general you'll get much better performance from using ::.
In C, you can have a pointer to the first and last element of a singly-linked list, providing constant time access to the end of a list. Thus, appending one list to another can be done in constant time.
As far as I am aware, scheme does not provide this functionality (namely constant access to the end of a list) by default. To be clear, I am not looking for "pointer" functionality. I understand that is non-idiomatic in scheme and (as I suppose) unnecessary.
Could someone either 1) demonstrate the ability to provide a way to append two lists in constant time or 2) assure me that this is already available by default in scheme or racket (e.g., tell me that append is in fact a constant operation if I am wrong to think otherwise)?
EDIT:
I should make myself clearer. I am trying to create an inspectable queue. I want to have a list that I can 1) push onto the front in constant time, 2) pop off the back in constant time, and 3) iterate over using Racket's foldr or something similar (a Lisp right fold).
Standard Lisp lists cannot be appended to in constant time.
However, if you make your own list type, you can do it. Basically, you can use a record type (or just a cons cell)---let's call this the "header"---that holds pointers to the head and tail of the list, and update it each time someone adds to the list.
However, be aware that if you do that, lists are no longer structurally inductive. i.e., a longer list isn't simply an extension of a shorter list, because of the extra "header" involved. Thus, you lose a great part of the simplicity of Lisp algorithms which involve recursing into the cdr of a list at each iteration.
In other words, the lack of easy appending is a tradeoff to enable recursive algorithms to be written much more easily. Most functional programmers will agree that this is the right tradeoff, since appending in a pure-functional sense means that you have to copy every cell in all but the last list---so it's no longer O(1), anyway.
ETA to reflect OP's edit
You can create a queue, but with the opposite behaviour: you add elements to the back, and retrieve elements in the front. If you are willing to work with that, such a data structure is easy to implement in Scheme. (And yes, it's easy to append two such queues in constant time.)
Racket also has a similar queue data structure, but it uses a record type instead of cons cells, because Racket cons cells are immutable. You can convert your queue to a list using queue->list (at O(n) complexity) for times when you need to fold.
You want a FIFO queue. user448810 mentions the standard implementation for a purely-functional FIFO queue.
Your concern about losing the "key advantage of Lisp lists" needs to be unpacked a bit:
You can write combinators for custom data structures in Lisp. If you implement a queue type, you can easily write fold, map, filter and so on for it.
Scheme, however, does lack in the area of providing polymorphic sequence functions that can work on multiple sequence types. You do often end up either (a) converting your data structures back to lists in order to use the rich library of list functions, or (b) implementing your own versions of various of these functions for your custom types.
This is very much a shame, because singly-linked lists, while they are hugely useful for tons of computations, are not a do-all data structure.
But what is worse is that there's a lot of Lisp folk who like to pretend that lists are a "universal datatype" that can and should be used to represent any kind of data. I've programmed Lisp for a living, and oh my god I hate the code that these people produce; I call it "Lisp programmer's disease," and have much too often had to go in and fix a lot of n^2 that uses lists to represent sets or dictionaries to use hash tables or search trees instead. Don't fall into that trap. Use proper data structures for the task at hand. You can always build your own opaque data types using record types and modules in Racket; you make them opaque by exporting the type but not the field accessors for the record type (you export your type's user-facing operations instead).
It sounds like you are looking for a deque, not a list. The standard idiom for a deque is to keep two lists, the front half of the list in normal order and the back half of the list in reverse order, thus giving access to both ends of the deque. If the half of the list that you want to access is empty, reverse the other half and swap the meaning of the two halves. Look here for a fuller explanation and sample code.
I've noticed that in functional languages such as Haskell and OCaml you can do 2 actions with lists. First you can do x:xs where x is an element ans xs is a list and the resulting action is we get a new list where x is appended to the beginning of xs in constant time. Second is x++y where both x and y are lists and the resulting action is we get a new list where y gets appended to the end of x in linear time with respect to the number of elements in x.
Now I'm no expert in how languages are designed and compilers are built, but this seems to me a lot like a simple implementation of a linked list with one pointer to the first item. If I were to implement this data structure in a language like C++ I would find it to be generally trivial to add a pointer to the last element. In this case if these languages were implemented this way (assuming they do use linked lists as described) adding a "pointer" to the last item would make it much more efficient to add items to the end of a list and would allow pattern matching with the last element.
My question is are these data structures really implemented as linked lists, and if so why do they not add a reference to the last element?
Yes, they really are linked lists. But they are immutable. The advantage of immutability is that you don't have to worry about who else has a pointer to the same list. You might choose to write x++y, but somewhere else in the program might be relying on x remaining unchanged.
People who work on compilers for such languages (of whom I am one) don't worry about this cost because there are plenty of other data structures that provide efficient access:
A functional queue represented as two lists provides constant-time access to both ends and amortized constant time for put and get operations.
A more sophisticated data structure like a finger tree can provide several kinds of list access at very low cost.
If you just want constant-time append, John Hughes developed an excellent, simple representation of lists as functions, which provides exactly that. (In the Haskell library they are called DList.)
If you're interested in these sorts of questions you can get good info from Chris Okasaki's book Purely Functional Data Structures and from some of Ralf Hinze's less intimidating papers.
You said:
Second is x++y where both x and y are
lists and the resulting action is y
gets appended to the end of x in
linear time with respect to the number
of elements in x.
This is not really true in a functional language like Haskell; y gets appended to a copy of x, since anything holding onto x is depending on it not changing.
If you're going to copy all of x anyway, holding onto its last node doesn't really gain you anything.
Yes, they are linked lists. In languages like Haskell and OCaml, you don't add items to the end of a list, period. Lists are immutable. There is one operation to create new lists — cons, the : operator you refer to earlier. It takes an element and a list, and creates a new list with the element as the head and the list as the tail. The reason x++y takes linear time is because it must cons the last element of x with y, and then cons the second-to-last element of x with that list, and so on with each element of x. None of the cons cells in x can be reused, because that would cause the original list to change as well. A pointer to the last element of x would not be very helpful here — we still have to walk the whole list.
++ is just one of dozens of "things you can do with lists". The reality is that lists are so versatile that one rarely uses other collections. Also, we functional programmers almost never feel the need to look at the last element of a list - if we need to, there is a function last.
However, just because lists are convenient this does not mean that we do not have other data structures. If you're really interested, have a look at this book http://www.cs.cmu.edu/~rwh/theses/okasaki.pdf (Purely Functional Data Structures). You'll find trees, queues, lists with O(1) append of an element at the tail, and so forth.
Here's a bit of an explanation on how things are done in Clojure:
The easiest way to avoid mutating state is to use immutable data structures. Clojure provides a set of immutable lists, vectors, sets and maps. Since they can't be changed, 'adding' or 'removing' something from an immutable collection means creating a new collection just like the old one but with the needed change. Persistence is a term used to describe the property wherein the old version of the collection is still available after the 'change', and that the collection maintains its performance guarantees for most operations. Specifically, this means that the new version can't be created using a full copy, since that would require linear time. Inevitably, persistent collections are implemented using linked data structures, so that the new versions can share structure with the prior version. Singly-linked lists and trees are the basic functional data structures, to which Clojure adds a hash map, set and vector both based upon array mapped hash tries.
(emphasis mine)
So basically it looks you're mostly correct, at least as far as Clojure is concerned.