Scheme: Constant Access to the End of a List? - list

In C, you can have a pointer to the first and last element of a singly-linked list, providing constant time access to the end of a list. Thus, appending one list to another can be done in constant time.
As far as I am aware, scheme does not provide this functionality (namely constant access to the end of a list) by default. To be clear, I am not looking for "pointer" functionality. I understand that is non-idiomatic in scheme and (as I suppose) unnecessary.
Could someone either 1) demonstrate the ability to provide a way to append two lists in constant time or 2) assure me that this is already available by default in scheme or racket (e.g., tell me that append is in fact a constant operation if I am wrong to think otherwise)?
EDIT:
I should make myself clearer. I am trying to create an inspectable queue. I want to have a list that I can 1) push onto the front in constant time, 2) pop off the back in constant time, and 3) iterate over using Racket's foldr or something similar (a Lisp right fold).

Standard Lisp lists cannot be appended to in constant time.
However, if you make your own list type, you can do it. Basically, you can use a record type (or just a cons cell)---let's call this the "header"---that holds pointers to the head and tail of the list, and update it each time someone adds to the list.
However, be aware that if you do that, lists are no longer structurally inductive. i.e., a longer list isn't simply an extension of a shorter list, because of the extra "header" involved. Thus, you lose a great part of the simplicity of Lisp algorithms which involve recursing into the cdr of a list at each iteration.
In other words, the lack of easy appending is a tradeoff to enable recursive algorithms to be written much more easily. Most functional programmers will agree that this is the right tradeoff, since appending in a pure-functional sense means that you have to copy every cell in all but the last list---so it's no longer O(1), anyway.
ETA to reflect OP's edit
You can create a queue, but with the opposite behaviour: you add elements to the back, and retrieve elements in the front. If you are willing to work with that, such a data structure is easy to implement in Scheme. (And yes, it's easy to append two such queues in constant time.)
Racket also has a similar queue data structure, but it uses a record type instead of cons cells, because Racket cons cells are immutable. You can convert your queue to a list using queue->list (at O(n) complexity) for times when you need to fold.

You want a FIFO queue. user448810 mentions the standard implementation for a purely-functional FIFO queue.
Your concern about losing the "key advantage of Lisp lists" needs to be unpacked a bit:
You can write combinators for custom data structures in Lisp. If you implement a queue type, you can easily write fold, map, filter and so on for it.
Scheme, however, does lack in the area of providing polymorphic sequence functions that can work on multiple sequence types. You do often end up either (a) converting your data structures back to lists in order to use the rich library of list functions, or (b) implementing your own versions of various of these functions for your custom types.
This is very much a shame, because singly-linked lists, while they are hugely useful for tons of computations, are not a do-all data structure.
But what is worse is that there's a lot of Lisp folk who like to pretend that lists are a "universal datatype" that can and should be used to represent any kind of data. I've programmed Lisp for a living, and oh my god I hate the code that these people produce; I call it "Lisp programmer's disease," and have much too often had to go in and fix a lot of n^2 that uses lists to represent sets or dictionaries to use hash tables or search trees instead. Don't fall into that trap. Use proper data structures for the task at hand. You can always build your own opaque data types using record types and modules in Racket; you make them opaque by exporting the type but not the field accessors for the record type (you export your type's user-facing operations instead).

It sounds like you are looking for a deque, not a list. The standard idiom for a deque is to keep two lists, the front half of the list in normal order and the back half of the list in reverse order, thus giving access to both ends of the deque. If the half of the list that you want to access is empty, reverse the other half and swap the meaning of the two halves. Look here for a fuller explanation and sample code.

Related

Why does Elixir have so many similar list types in the standard library?

I'm doing the Elixir koans, and already I've worked through something like five different listy data types:
List
Char list
Word list
Tuple
Keyword list
Map
MapSet
Struct
Some of these I buy, but all of them at the same time? Does anyone actually use all of these lists for strictly separated purposes?
Short answer is: yes.
Long answer is:
Lists - are a basic data structure you use everywhere. Lists are ordered and allow duplicates. The main use case is: homogenous varied-length collections
Charlists - where Elixir uses strings (based on binaries), Erlang usually uses charlists (lists of integer codepoints). It's mainly a compatibility interface;
Word lists - I've never heard of those;
Tuples - are another basic data structure you use everywhere. The main use case is: heterogenous fixed-length collections;
Keyword lists - are very common, mainly used for options. It's a simple abstraction on top of lists and tuples (a list of two-element tuples). Allow for duplicate keys and maintain order of keys, since they are ordered pattern-matching is very impractical.
Maps - are common too. Allow for easy pattern matching on keys, but do not allow duplicate keys and are not ordered.
MapSet - sets are a basic data structure - an unordered, unique collection of elements.
Structs - are the main mechanism for polymorphism in Elixir (through protocols), allow creating more rigid structures with keyset enforced at compile-time.
With functional programming choosing the right data structure to represent your data is often half of the issue, that's why you get so many different structures, with different characteristics. Each one has it's use-cases and is useful in different ways.
#michalmuskala provided here great answer, maybe I just extend it a bit.
Lists are the workhorse in Elixir. There's a plenty of issues that you will solve with lists. Lists are not arrays, where random access is the best way to get values, instead lists in Elixir are linked data structures and you traverse them by splitting into head and tail (if you know LISP, Prolog or Erlang, you'd will just like in home).
Charlists are just lists, but narrowed to lists of integers.
Tuples - usually they contain two to four elements. There are common way to pass additional data, but still send one parameter. Common behaviours like GenServer etc. uses them as an expected reply.
Keyword lists are list of tuples and you can use them when you need to store for one key more than one value. This is syntantic sugar.
Instead of a = [{:name, "Patryk"}] you can have a = [name: "Patryk"] and access it with a[:name].
Maps are associative arrays, hashes, dicts etc. One key holds one value and keys are unique.
Set - think about mathematicians sets. Unordered collection of unique values.
Struct - as #michalmuskala wrote they are used in protocols and they are checked by the compiler. Actually they're maps defined for module.
The answers are to be read from the bottom to the top :)
#michalmuskala provided here great answer, #patnowak extended it perfectly. I am here to mostly answer to the question “Does anyone actually use all of these lists for strictly separated purposes?“
Elixir (as well as Erlang) is all about pattern matching. Having different types of lists makes it easy to narrow the pattern matching in each particular case:
List is used mostly in recursion; Erlang has no loops, instead one does recursive calls. It’s highly optimized when used properly (tail-recursion.) Usually matches as [head | tail].
charlist is used in “string” pattern matching, whatever it means. Check for “the first letter of his name is ‘A’” in Erlang would be done with pattern match against [?A | rest] = "Aleksei" |> List.Chars.to_charlist
Tuple is used in pattern matching of different instances of the more-or-less same entity. Fail/Success would be returned as tuples {:ok, result} and {:error, message} respectively and pattern matched afterwards. GenServer simplifies handling of different messages that way as well.
Map is to be pattern matched as %{name: "Aleksei"} = generic_input to immediately extract the name. Keywords are more or less the same.
etc.

SML/NJ - Effective way or data structure to access from end towards the start

I am making a program, and an algorithm I have thought to use requires a cheap way of accessing a list backwards to be effective. Is there an effective way to access a list from the last element forward? Or, because I think that might be impossible due to the structure of SML lists, is there an effective data structure to achieve it?
The length of data is unknown before executing, and there is no need for other than serial traversing of the data.
I think you want a functional deque. See e.g. Okasaki's paper on the subject. Specifically, Figure 5 shows an implementation of deques.
If using a functional deque seems like overkill and you need to traverse the list in reverse order just once, then solutions that e.g. use List.last and List.take to emulate hd and tl but in reverse order are, as you seem to know, bad because they would make the list traversal quadratic. On the other hand, the built in function rev is very efficient since it is both tail-recursive and linear. If you feed a list to a function that needs to traverse that list in reverse order, an easy solution is to use a let binding using rev to create a local copy of the list in reverse order and then traverse the reversed list in the usual way.

Inconsistencies in Dart ListBase

Is there any particular reason for the inconsistent return types of the functions in Dart's ListBase class?
Some of the functions do what (as a functional programmer) I would expect, that is: List -> (apply function) -> List. These include: take, skip, reversed.
Others do not: thus l.removeLast() returns just the final element of the list; to get the List without the final element, you have to use a cascade: l..removeLast().
Others return a lazy Iterable, which requires further work to retrieve the list: newl = l.map(f).toList().
Some functions operate more like properties l.last, as opposed to functions l.removeLast()
Is there some subtle reason for these choices?
mbmcavoy is right. Dart is an imperative language and many List members modify the list in-place. The most prominent is the operator []=, but sort, shuffle, add, removeLast, etc. fall into the same category.
In addition to these imperative members, List inherits some functional-style members from Iterable: skip, take, where, map, etc. These are lazy and do not modify the List in place. They are backed by the original list. Modifying the backing list, will change the result of iterating over the iterable. List furthermore adds a few lazy members, like reversed.
To avoid confusion, lazy members always return an Iterable and not an object implementing the List interface. Some of the iterables guarantee fast length and index-operators (like take, skip and reversed) and could easily implement the List interface. However, this would inevitably lead to bugs, since they are lazy and backed by the original list.
(Disclaimer: I have not yet used Dart specifically, but hope to soon.)
Dart is not a functional programming language, which may the the source of your confusion.
Methods, such as .removeLast() are intended to change the state of the object they are called upon. The operation performed by l.removeLast() is to modify l so that it no longer contains the last item. You can access the resulting list by simply using l in your next statement.
(Note they are called "methods" rather than "functions", as they are not truly functions in the mathematical sense.)
The choice to return the removed item rather than the remaining list is a convenience. most frequently, the program will need to do something with the removed item (like move it to a different list).
For other methods, the returned data will relate to a common usage scenario, but it isn't always necessary to capture it.

List design in functional languages

I've noticed that in functional languages such as Haskell and OCaml you can do 2 actions with lists. First you can do x:xs where x is an element ans xs is a list and the resulting action is we get a new list where x is appended to the beginning of xs in constant time. Second is x++y where both x and y are lists and the resulting action is we get a new list where y gets appended to the end of x in linear time with respect to the number of elements in x.
Now I'm no expert in how languages are designed and compilers are built, but this seems to me a lot like a simple implementation of a linked list with one pointer to the first item. If I were to implement this data structure in a language like C++ I would find it to be generally trivial to add a pointer to the last element. In this case if these languages were implemented this way (assuming they do use linked lists as described) adding a "pointer" to the last item would make it much more efficient to add items to the end of a list and would allow pattern matching with the last element.
My question is are these data structures really implemented as linked lists, and if so why do they not add a reference to the last element?
Yes, they really are linked lists. But they are immutable. The advantage of immutability is that you don't have to worry about who else has a pointer to the same list. You might choose to write x++y, but somewhere else in the program might be relying on x remaining unchanged.
People who work on compilers for such languages (of whom I am one) don't worry about this cost because there are plenty of other data structures that provide efficient access:
A functional queue represented as two lists provides constant-time access to both ends and amortized constant time for put and get operations.
A more sophisticated data structure like a finger tree can provide several kinds of list access at very low cost.
If you just want constant-time append, John Hughes developed an excellent, simple representation of lists as functions, which provides exactly that. (In the Haskell library they are called DList.)
If you're interested in these sorts of questions you can get good info from Chris Okasaki's book Purely Functional Data Structures and from some of Ralf Hinze's less intimidating papers.
You said:
Second is x++y where both x and y are
lists and the resulting action is y
gets appended to the end of x in
linear time with respect to the number
of elements in x.
This is not really true in a functional language like Haskell; y gets appended to a copy of x, since anything holding onto x is depending on it not changing.
If you're going to copy all of x anyway, holding onto its last node doesn't really gain you anything.
Yes, they are linked lists. In languages like Haskell and OCaml, you don't add items to the end of a list, period. Lists are immutable. There is one operation to create new lists — cons, the : operator you refer to earlier. It takes an element and a list, and creates a new list with the element as the head and the list as the tail. The reason x++y takes linear time is because it must cons the last element of x with y, and then cons the second-to-last element of x with that list, and so on with each element of x. None of the cons cells in x can be reused, because that would cause the original list to change as well. A pointer to the last element of x would not be very helpful here — we still have to walk the whole list.
++ is just one of dozens of "things you can do with lists". The reality is that lists are so versatile that one rarely uses other collections. Also, we functional programmers almost never feel the need to look at the last element of a list - if we need to, there is a function last.
However, just because lists are convenient this does not mean that we do not have other data structures. If you're really interested, have a look at this book http://www.cs.cmu.edu/~rwh/theses/okasaki.pdf (Purely Functional Data Structures). You'll find trees, queues, lists with O(1) append of an element at the tail, and so forth.
Here's a bit of an explanation on how things are done in Clojure:
The easiest way to avoid mutating state is to use immutable data structures. Clojure provides a set of immutable lists, vectors, sets and maps. Since they can't be changed, 'adding' or 'removing' something from an immutable collection means creating a new collection just like the old one but with the needed change. Persistence is a term used to describe the property wherein the old version of the collection is still available after the 'change', and that the collection maintains its performance guarantees for most operations. Specifically, this means that the new version can't be created using a full copy, since that would require linear time. Inevitably, persistent collections are implemented using linked data structures, so that the new versions can share structure with the prior version. Singly-linked lists and trees are the basic functional data structures, to which Clojure adds a hash map, set and vector both based upon array mapped hash tries.
(emphasis mine)
So basically it looks you're mostly correct, at least as far as Clojure is concerned.

In Clojure, when should trees of heterogenous node types be represented using records or vectors?

Which is better idiomatic clojure practice for representing a tree made up of different node types:
A. building trees out of several different types of records, that one defines using deftype or defrecord:
(defrecord node_a [left right])
(defrecord node_b [left right])
(defrecord leaf [])
(def my-tree (node_a. (node_b. (leaf.) (leaf.)) (leaf.)))
B. building trees out of vectors, with keywords designating the types:
(def my-tree [:node-a [:node-b :leaf :leaf] :leaf])
Most clojure code that I see seems to favor the usage of the general purpose data structures (vectors, maps, etc.), rather than datatypes or records. Hiccup, to take one example, represents html very nicely using the vector + keyword approach.
When should we prefer one style over the other?
You can put as many elements into a vector as you want. A record has a set number of fields. If you want to constrain your nodes to only have N sub-nodes, records might be good, e.g. making when a binary tree, where a node has to have only a Left and Right. But for something like HTML or XML, you probably want to support arbitrary numbers of sub-nodes.
Using vectors and keywords means that "extending" the set of supported node types is as simple as putting a new keyword into the vector. [:frob "foo"] is OK in Hiccup even if its author never heard of frobbing. Using records, you'd potentially have to define a new record for every node type. But then you get the benefit of catching typos and verifying subnodes. [:strnog "some bold text?"] isn't going to be caught by Hiccup, but (Strnog. "foo") would be a compile-time error.
Vectors being one of Clojure's basic data types, you can use Clojure's built-in functions to manipulate them. Want to extend your tree? Just conj onto it, or update-in, or whatever. You can build up your tree incrementally this way. With records, you're probably stuck with constructor calls, or else you have to write a ton of wrapper functions for the constructors.
Seems like this partly boils down to an argument of dynamic vs. static. Personally, I would go the dynamic (vector + keyword) route unless there was a specific need for the benefits of using records. It's probably easier to code that way, and it's more flexible for the user, at the cost of being easier for the user to end up making a mess. But Clojure users are likely used to having to handle dangerous weapons on a regular basis. Clojure being largely a dynamic language, staying dynamic is often the right thing to do.
This is a good question. I think both are appropriate for different kinds of problems. Nested vectors are a good solution if each node can contain a variable set of information - in particular templating systems are going to work well. Records are a good solution for a smallish number of fixed node types where nesting is far more constrained.
We do a lot of work with heterogeneous trees of records. Each node represents one of a handful of well-known types, each with a different set of known fixed keys. The reason records are better in this case is that you can pick the data out of the node by key which is O(1) (really a Java method call which is very fast), not O(n) (where you have to look through the node contents) and also generally easier to access.
Records in 1.2 are imho not quite "finished" but it's pretty easy to build that stuff yourself. We have a defrecord2 that adds constructor functions (new-foo), field validation, print support, pprint support, tree walk/edit support via zippers, etc.
An example of where we use this is to represent ASTs or execution plans where nodes might be things like Join, Sort, etc.
Vectors are going to be better for creating stuff like strings where an arbitrary number of things can be put in each node. If you can stuff 1+ <p>s inside a <div>, then you can't create a record that contains a :p field - that just doesn't make any sense. That's a case where vectors are far more flexible and idiomatic.