I've been trying to figure out what list another list representation corresponds to:
[1|[2|[3|[4|[]]]]] is equivalent to the list [1,2,3,4] and [1,2,3,4|[]] etc.
But I just can't seem to figure out what list this corresponds to:
[[[[[]|1]|2]|3]|4]
If anyone can explain this for me I would be very grateful.
[H|T] is syntactic sugar for the real representation using the . functor: .(H,T) where H is the "head" (one list element), and T is the "tail" (which is itself a list in a standard list structure). So [1|[2|[3|[4|[]]]]] is .(1,.(2,.(3,.(4,[])))). Prolog also allows a non-list value for T, so [[[[[]|1]|2]|3]|4] is .(.(.(.([],1),2),3),4). The second structure doesn't really simplify any further than that from a list notational standpoint.
If you want to think in terms of "list" then [[[[[]|1]|2]|3]|4] is a list whose head is [[[[]|1]|2]|3] and tail is 4. And since 4 is not a list, the original list can be described as "improper" as #z5h indicated since the tail isn't a list (not even the empty list). Drilling down, the head [[[[]|1]|2]|3] is itself a list with head [[[]|1]|2] and tail 3. Another "improper" list. And so on. Therefore, the overall structure is an embedded list of lists, four levels deep, in which each list has a single head and a non-list tail (an "improper" list).
It's interesting to note that some of Prolog's predicates handle this type of list. For example:
append([], 1, L).
Will yield:
L = [[]|1]
You can then build your oddly formed list using append:
append([[]], 1, L1), % append 1 as the tail to [[]] giving L1
append([L1], 2, L2), % append 2 as the tail to [L1] giving L2
append([L2], 3, L3), % append 3 as the tail to [L2] giving L3
append([L3], 4, L4). % append 4 as the tail to [L3] giving L4
Which yields:
L4 = [[[[[]|1]|2]|3]|4]
Each append takes a list of one element (which is itself a list from the prior append, starting with [[]]) and appends a non-list tail to it.
It's called an "improper list".
Let me explain.
I think a few different ideas are at play here.
In general we know what a list is: it's an ordered collection.
How we represent a list literal in a programming language is another facet.
How the list is represented internally in the language, is another facet still.
What you have realized, is that there is a data structure you can represent in Prolog, with similar syntax to a list, but it somehow seems "improper". (It's not clear that you have asked much beyond "what is the meaning of this confusing thing?").
It turns out, that this structure is simply known as an "improper list". This data structure shows up often in languages that store lists internally as nested cons cells or similar. If you Google for that, you'll find plenty of resources for examples, usages, etc.
Related
I've recently started learning scala, and I've come across the :: (cons) function, which prepends to a list.
In the book "Programming in Scala" it states that there is no append function because appending to a list has performance o(n) whereas prepending has a performance of o(1)
Something just strikes me as wrong about that statement.
Isn't performance dependent on implementation? Isn't it possible to simply implement the list with both forward and backward links and store the first and last element in the container?
The second question I suppose is what I'm supposed to do when I have a list, say 1,2,3 and I want to add 4 to the end of it?
The key is that x :: somelist does not mutate somelist, but instead creates a new list, which contains x followed by all elements of somelist. This can be done in O(1) time because you only need to set somelist as the successor of x in the newly created, singly linked list.
If doubly linked lists were used instead, x would also have to be set as the predecessor of somelist's head, which would modify somelist. So if we want to be able to do :: in O(1) without modifying the original list, we can only use singly linked lists.
Regarding the second question: You can use ::: to concatenate a single-element list to the end of your list. This is an O(n) operation.
List(1,2,3) ::: List(4)
Other answers have given good explanations for this phenomenon. If you are appending many items to a list in a subroutine, or if you are creating a list by appending elements, a functional idiom is to build up the list in reverse order, cons'ing the items on the front of the list, then reverse it at the end. This gives you O(n) performance instead of O(n²).
Since the question was just updated, it's worth noting that things have changed here.
In today's Scala, you can simply use xs :+ x to append an item at the end of any sequential collection. (There is also x +: xs to prepend. The mnemonic for many of Scala's 2.8+ collection operations is that the colon goes next to the collection.)
This will be O(n) with the default linked implementation of List or Seq, but if you use Vector or IndexedSeq, this will be effectively constant time. Scala's Vector is probably Scala's most useful list-like collection—unlike Java's Vector which is mostly useless these days.
If you are working in Scala 2.8 or higher, the collections introduction is an absolute must read.
Prepending is faster because it only requires two operations:
Create the new list node
Have that new node point to the existing list
Appending requires more operations because you have to traverse to the end of the list since you only have a pointer to the head.
I've never programmed in Scala before, but you could try a List Buffer
Most functional languages prominently figure a singly-linked-list data structure, as it's a handy immutable collection type. When you say "list" in a functional language, that's typically what you mean (a singly-linked list, usually immutable). For such a type, append is O(n) whereas cons is O(1).
I have to write a Union function using recursion
The ouput has to be the Union (no duplicates) of two lists. My teacher said the implementation has to be recursive and we cannot go through the lists twice but I don't think I can come up with a way of solving the problem without going through the lists twice?
My ideas which would solve the problem (but involve going through lists twice):
- Merge then remove duplicates
- Sorting the lists, then merge
Any hints or help would be appreciated
Edit: Well so I got to combine both lists by doing this:
union1 :: (Eq a) => [a] -> [a] -> [a]
union1 xs [] = xs
union1 [] ys = ys
union1 (x:xs)(y:ys) = x:y:union1(xs)(ys)
Then I thought I could use nub or a similar function to remove the duplicates but I got stuck thinking because then I would be going through the lists twice, right?
What is list union?
I would like to first point out that the requirements your teacher gave you are a bit vague. Moreover, union on multisets (aka sets that can have duplicates, like lists) have two different definitions in mathematics (other source). I am no mathematician, but here is what I was able to glean from various internets. Here is one definition:
λ> [1,2,2,3,3,3] `unionA` [1,2,2,2,3] --also called multiset sum
[1,1,2,2,2,2,2,3,3,3,3]
This is simply (++), if you're not worried about ordering. And here is the other:
λ> [1,2,2,3,3,3] `unionB` [1,2,2,2,3]
[1,2,2,2,3,3,3] --picks out the max number of occurrences from each list
Adding to this confusion, Data.List implements a somewhat quirky third type of union, that treats its left input differently from its right input. Here is approximately the documentation found in comments the source code of union from Data.List:
The union function returns the list union of the two lists. Duplicates, and elements of the first list, are removed from the the second list, but if the first list contains duplicates, so will the result. For example,
λ> "dog" `union` "cow"
"dogcw"
Here, you have 3 possible meanings of "union of lists" to choose from. So unless you had example input and output, I don't know which one your teacher wants, but since the goal is probably for you to learn about recursion, read on...
Removing duplicates
Removing duplicates from unordered lists in Haskell can be done in linear time, but solutions involve either random-access data structures, like Arrays, or something called "productive stable unordered discriminators", like in the discrimination package. I don't think that's what your teacher is looking for, since the first is also common in imperative programming, and the second is too complex for a beginner Haskell course. What your teacher probably meant is that you should traverse each list explicitly only once.
So now for the actual code. Here I will show you how to implement union #3, starting from what you wrote, and still traverse the lists (explicitly) only once. You already have the basic recursion scheme, but you don't need to recurse on both lists, since we opted for option #3, and therefore return the first list unchanged.
Actual code
You'll see in the code below that the first list is used as an "accumulator": while recursing on the second list, we check each element for a duplicate in the first list, and if there isn't a duplicate, we append it to the first list.
union [] r = r
union l [] = l
unionR ls (r:rs)
| r `elem` ls = unionR ls rs --discard head of second list if already seen
--`elem` traverses its second argument,
--but see discussion above.
| otherwise = unionR (r:ls) rs --append head of second list
As a side note, you can make this a bit more readable by using a fold:
union xs = foldl step xs where --xs is our first list; no recursion on it,
--we use it right away as the accumulator.
step els x
| x `elem` els = els
| otherwise = x : els
I have to implement the predicate cons(List, Term) that will take a list [Head|Tail] and convert it to terms, represented as next(Head, Tail). How do I do this? I don't even know where to start.
Here is the example of a successful query given in the question:
cons([a,b,c],X). /*query returns X=next(a,next(b,next(c,null))).*/
Doing most anything with lists will require that you consider two cases: the empty list and a list with a head and a sublist. Usually your base case is handling the empty list and your inductive case is handling the list with sublist.
First consider your base case:
cons([], null).
Now deal with your inductive case:
cons([X|Xs], next(X, Rest)) :- cons(Xs, Rest).
I was debugging an error and found out that undefined had been appended to a list, which caused a crash later on.
I expected that appending something other than a list with the ++ operator would cause a crash. But this is not true for undefined. Here is an example:
1> [1,2,3] ++ undefined.
[1,2,3|undefined]
Although it does not crash, the list is not fully functional anymore:
1> L = [1,2,3] ++ undefined.
[1,2,3|undefined]
2> L ++ [4].
** exception error: bad argument
in operator ++/2
called as [1,2,3|undefined] ++ [4]
Why does this happen?
Is this related to the underlying implementation of lists in erlang?
The reason is the ++ appends it second argument to the end of its first argument which must be a list. It does not do any processing of its second argument, it just appends it as is. So:
1> [1,2,3] ++ undefined.
[1,2,3|undefined]
2> [1,2,3] ++ [undefined].
[1,2,3,undefined]
The reason you can do this as well as:
3> [a|b].
[a|b]
4> [a|[b]].
[a,b]
is that a list is a sequence list cells, a singly linked list, not a single data structure as such. If the right-hand-sides, called the tail, of each cell is another list cell or [] then you get a proper list. The left-hand-side of each cell is called the head and usually contains the elements of the list. This is what we have in 2 and 4 above. Most, if not all, library functions assume that lists are proper lists and will generate an error if they are not. Note that you have to actually step down the whole list to the end to see if it is proper or not.
Each list cell is written as [Head|Tail] and the syntax [a,b,c] is just syntactic sugar for [a|[b|[c|[]]]]. Note that the tail of each list cell is a list or [] so this is a proper list.
There are no restrictions as to what the types the head and tail of a list cell can be. The system never checks it just does it. This what we have in 1 and 3 above where the tail of the last list cell (only list cell in 3) is not a list or [].
Sorry getting a bit over-didactic here.
EDIT: I see that I have already described this here: Functional Programming: what is an "improper list"?
In Erlang, all terms are represented by a compact pointer-like value called Eterm. It appears that list manipulation functions are implemented as type-agnostic.
Consider from this perspective: inside the erlang VM all Eterms are equal. The head and tail list manipulation operations are billed as being very fast. Since it takes several operations to evaluate the opaque Eterm type to determine whether or not it is a list, why bother?
The expected outcome in such a situation is an error, and you do get one. Eventually.
There's something to be said for trusting the programmer, and when dealing with an operation that adds several cycles and is used frequently, the potential benefit of ignoring a bad append stacks up, and the only penalty is a strange error.
I'm trying to be a good erlanger and avoid "++". I need to add a tuple to the end of a list without creating a nested list (and hopefully without having to build it backwards and reverse it). Given tuple T and lists L0 and L1:
When I use [T|L0] I get [tuple,list0].
But when I use [L0|T], I get nested list [[list0]|tuple]. Similarly, [L0|L1] returns [[list0]|list1].
Removing the outside list brackets L0|[T] produces a syntax error.
Why is "|" not symmetric? Is there a way to do what I want using "|"?
| is not "symmetric" because a non-empty list has a head and a tail where the head is a single item and the tail is another list. In the expression [foo | bar] foo denotes the head of the list and bar is the tail. If the tail is not a proper list, the result won't be a proper list either. If the head is a list, the result will simply be a list with that list as its first element.
There is no way to append at the end of a linked list in less than O(n) time. This is why using ++ for that is generally shunned. If there were special syntax to append at the end of the list, it would still need to take O(n) time and using that syntax wouldn't make you any more of a "good erlanger" than using ++ would.
If you want to avoid the O(n) cost per insertion, you'll need to prepend and then reverse. If you're willing to pay the cost, you might just as well use ++.
A little more detail on how lists work:
[ x | y ] is something called a cons cell. In C terms it's basically a struct with two members. A proper list is either the empty list ([]) or a cons cell whose second member is a proper list (in which case the first member is called its head, and the second member is called its tail).
So when you write [1, 2, 3] this creates the following cons cells: [1 | [2 | [3 | []]]]. I.e. the list is represented as a cons cell whose first member (its head) is 1 and the second member (the tail) is another cons cell. That other cons cell has 2 as its head and yet another cons cell as its tail. That cell has 3 as its head and the empty list as its tail.
Traversing such a list is done recursively by first acting on the head of the list and then calling the traversal function on the tail of the list.
Now if you want to prepend an item to that list, this is very easy: you simply create another cons cell whose head is the new item and whose tail is the old list.
Appending an item however is much more expensive because creating a single cons cell does not suffice. You have to create a list that is the same as the old one, except the tail of the last cons cell must be a new cons cell whose head is the new element and whose tail is the empty list. So you can't append to a list without going through the whole list, which is O(n).