I'm trying to create a custom unfold function that returns its last accumulator value like:
val unfold' : generator:('State -> ('T * 'State) option) -> state:'State -> 'T list * 'State
I managed to make the following:
let unfold' generator state =
let rec loop resultList state =
match generator state with
| Some (value, state) -> loop (value :: resultList) state
| None -> (List.rev resultList), state
loop [] state
But I wanted to avoid to List.rev the resulting list and generate it already with the correct order. I imagine it would be necessary to use continuations to build the list, but I'm quite new to functional programming and have not yet managed to wrap my mind around continuations; and all alternatives I can imagine would put the accumulator inside the resulting list or not allow it to be returned by the function.
Is there some way to do this?
As this is a personal learning exercise I would prefer an answer explaining how to do it instead of simply giving the completed code.
The way to do without a List.rev is to pass a function instead of the resultList parameter. Let's call that function buildResultList. At each step, this function would take the already-built tail of the list, prepend the current item, and then pass this to the function from the previous step, which would append the previous item, pass it to the function from the previous-previous step, and so on. The very last function in this chain will prepend the very first item to the list. The result of the whole recursive loop would be the last function of the chain (it calls all the previous ones), which you would then call with empty list as argument. I'm afraid this is as clear as I can go without just writing the code.
However, the thing is, this wouldn't be any better, for any definition of "better". Since the computation is progressing "forward", and resulting list is built "backward" (head :: tail, Lisp-style), you have to accumulate the result somewhere. In your code, you're accumulating it in a temporary list, but if you modify it to use continuations, you'll be accumulating it on the heap as a series of closures that reference each other in a chain. One could argue that it would be, in essence, the same list, only obfuscated.
Another approach you could try is to use a lazy sequence instead: build a recursive seq computation, which will yield the current item and then yield! itself. You can then enumerate this sequence, and it won't require a "temporary" storage. However, if you still want to get a list at the end, you'll have to convert the sequence to a list via List.ofSeq, and guess how that's going to be implemented? Theoretically, from purely mathematical standpoint, List.ofSeq would be implemented in exactly the same way: by building a temp list first and then reversing it. But the F# library cheats here: it builds the list in a mutable way, so it doesn't have to reverse.
And finally, since this is a learning exercise, you could also implement the equivalent of a lazy sequence yourself. Now, the standard .NET sequences (aka IEnumerable<_>, which is what Seq<_> is an alias for) are inherently mutable: you're changing the internal state of the iterator every time you move to the next item. You can do that, or, in the spirit of learning, you can do an immutable equivalent. That would be almost like a list (i.e. head::tail), except that, since it's lazy, the "tail" has to be a promise rather than the actual sequence, so:
type LazySeq<'t> = LazySeq of (unit -> LazySeqStep<'t>)
and LazySeqStep<'t> = Empty | Cons of head: 't * tail: LazySeq<'t>
The way to enumerate is to invoke the function, and it will return you the next item plus the tail of the sequence. Then you can write your unfold as a function that returns current item as head and then just returns itself wrapped in a closure as tail. Turns out pretty simple, actually.
Thanks to Fyodor Soikin's answer, here is the resulting function:
let unfold' generator state =
let rec loop build state =
match generator state with
| Some (value, state) -> loop (fun newValue -> build (value :: newValue)) state
| None -> (build []), state
loop id state
Related
I understand that lists are implemented as singly linked so they don't really have a constant structure that you can pin a length on, but each node should know how many nodes till the last element right? There isn't a way to add a node to some existing list and for that node not to be able to determine the length of the list it represents in constant time provided that the existing nodes already have that info.
I can understand why that wouldn't work in Haskell, for example, due to lazyness, but as far as I know F# lists aren't lazy. So, is the problem just in the extra memory overhead?
Seems to me like typical memory vs time performance consideration.
If standard f# list had the implementation You suggest, then it would need much more place in memory (consider one million long list of bools). And everyone using such list would have to deal with it. There would be no simple way to opt out of this other than writing completely new implementation of list.
On the other hand, it seems to be fairly simple to create a new type that would store length of succeeding list with each element basing on F# List. You can implement it on Your own if You need it. Those, who don't need it will use standard implementation.
I don't often find myself needing to know the length of the list, it's not like you need it to exit a for loop like you would with arrays in imperative languages.
For those rare cases when you really need to know the length asap, you can go with Carsten König's suggestion from a comment and make your 'a list into a ('a * int) list, where each node keeps the length of the tail as a tuple element.
Then you can have something like this:
let push lst e =
match lst with
| (_, c)::_ -> (e, c + 1) :: lst
| [] -> [e, 0]
and length and pop functions to match.
For all the other cases I'd call it a premature optimization.
I've defined a custom list type as part f a homework exercise.
type 'a myType =
| Item of ('a * 'a myType)
| Empty;;
I've already done 'length' and now I need a 'append' function.
My length function:
let length l =
let rec _length n = function
| Empty -> n
| Item(_, next) -> _length (n + 1) next
in _length 0 l;;
But I really don't know how to make the append function.
let append list1 list2 = (* TODO *)
I can't use the list module so I can't use either :: or #.
I guess my comments are getting too lengthy to count as mere comments. I don't really want to answer, I just want to give hints. Otherwise it defeats the purpose.
To repeat my hints:
a. The second parameter will appear unchanged in your result, so you can just
spend your time worrying about the first parameter.
b. You first need to know how to append something to an empty list. I.e., you need
to know what to do when the first parameter is Empty.
c. You next need to know how to break down the non-empty case into a smaller append
problem.
If you don't know how to create an item, then you might start by writing a function that takes (say) an integer and a list of integers and returns a new list with the integer at the front. Here is a function that takes an integer and returns a list containing just that one integer:
let list1 k =
Item (k, Empty)
One way to think of this is that every time Item appears in your code, you're creating a new item. Item is called a constructor because it constructs an item.
I hope this helps.
Your structure is a list, so you should start by defining a value nil that is the empty list, and a function cons head tail, that appends the head element in front of the list tail.
Another advice: sometimes, it helps a lot to start by taking a simple example, and trying to do it manually, i.e. decomposing what you want to do in simple operations that you do yourself. Then, you can generalize and write the code...
As:: : 'a -> 'a list -> 'a list is used to add an element to the begin of a list, Could anyone tell me if there is a function to add an element to the end of a list? If not, I guess List.rev (element::(List.rev list)) is the most straightforward way to do it?
Thank you!
The reason there's not a standard function to do this is that appending at the end of a list is an anti-pattern (aka a "snoc list" or a Schlemiel the Painter algorithm). Adding an element at the end of a list requires a full copy of the list. Adding an element at the front of the list requires allocating a single cell—the tail of the new list can just point to the old list.
That said, the most straightforward way to do it is
let append_item lst a = lst # [a]
list#[element] should work. # joins lists.
Given that this operation is linear, you should not use it in the "hot" part of your code, where performance matters. In a cold part, use list # [element] as suggest by Adi. In a hot part, rewrite your algorithm so that you don't need to do that.
The typical way to do it is to accumulate results in the reverse order during processing, and then reverse the whole accumulated list before returning the result. If you have N processing steps (each adding an element to the list), you therefore amortize the linear cost of reverse over N elements, so you keep a linear algorithm instead of a quadratic one.
In some case, another technique that work is to process your elements in reverse order, so that the accumulated results turn out to be in the right order without explicit reversal step.
I'm looking for an Erlang library function that will return the index of a particular element in a list.
So, if
X = [10,30,50,70]
lists:index_of(30, X)
would return 1, etc., just like java.util.List's indexOf() method.
Does such a method exist in the Erlang standard lib? I tried looking in the lists module but no luck. Or should I write it myself?
You'll have to define it yourself, like this:
index_of(Item, List) -> index_of(Item, List, 1).
index_of(_, [], _) -> not_found;
index_of(Item, [Item|_], Index) -> Index;
index_of(Item, [_|Tl], Index) -> index_of(Item, Tl, Index+1).
Note however that accesing the Nth element of a list is O(N), so an algorithm that often accesses a list by index will be less efficient than one that iterates through it sequentially.
As others noted, there are more efficient ways to solve for this. But if you're looking for something quick, this worked for me:
string:str(List, [Element]).
Other solutions (remark that these are base-index=1):
index_of(Value, List) ->
Map = lists:zip(List, lists:seq(1, length(List))),
case lists:keyfind(Value, 1, Map) of
{Value, Index} -> Index;
false -> notfound
end.
index_of(Value, List) ->
Map = lists:zip(List, lists:seq(1, length(List))),
case dict:find(Value, dict:from_list(Map)) of
{ok, Index} -> Index;
error -> notfound
end.
At some point, when the lists you pass to these functions get long enough, the overhead of constructing the additional list or dict becomes too expensive. If you can avoid doing the construction every time you want to search the list by keeping the list in that format outside of these functions, you eliminate most of the overhead.
Using a dictionary will hash the values in the list and help reduce the index lookup time to O(log N), so it's better to use that for large, singly-keyed lists.
In general, it's up to you, the programmer, to organize your data into structures that suit how you're going to use them. My guess is that the absence of a built-in index_of is to encourage such consideration. If you're doing single-key lookups -- that's really what index_of() is -- use a dictionary. If you're doing multi-key lookups, use a list of tuples with lists:keyfind() et al. If your lists are inordinately large, a less simplistic solution is probably best.
This function is very uncommon for Erlang and this is may be reason why it is not in standard library. No one of experienced Erlang programmers need it and is discourage to use algorithms using this function. When someone needs it, can write for own purpose but this very rare occasions are not reason to include it to stdlib. Design your data structures in proper way instead of ask for this function. In most cases need of this function indicates error in design.
I think the writer makes a valid case. Here is my use case from a logging application. The objective is to check the severity of an error against the actions to be performed against various levels of error response.
get_index(A,L) ->
get_index(A,L,1).
get_index(A,[A|_],N) ->
N;
get_index(A,[_|T],N) ->
get_index(A,T,N+1).
get_severity(A) ->
Severity=[debug,info,warn,error],
get_index(A,Severity).
The following function returns a list of indices of a given element in a list. Result can be used to get the index of the first or last occurrence of a duplicate element in a list.
indices_of(Element, L) ->
Indices = lists:zip(lists:seq(1,length(L)), L),
[ I || {I, E} <- Indices, E == Element ].
From what I understand, the list type in Haskell is implemented internally using a linked list. However, the user of the language does not get to see the details of the implementation, nor does he have the ability to modify the "links" that make up the linked list to allow it to point to a different memory address. This, I suppose, is done internally.
How then, can the list type be qualified as in Haskell ? Is it a "data type" or an "abstract data type"? And what of the linked list type of the implementation ?
Additionally, since the list type provided by the Prelude is not a linked list type, how can the basic linked list functions be implemented ?
Take, for example, this piece of code designed to add an element a at the index n of a list :
add [] acc _ _ = reverse acc
add (x:xs) acc 0 a = add xs (x:a:acc) (-1) a
add (x:xs) acc n a = add xs (x:acc) (n-1) a
Using a "real" linked list, adding an element would just consist of modifying a pointer to a memory address. This is not possible in Haskell (or is it ?), thus the question : is my implementation of adding an element to a list the best possible one, or am I missing something (the use of the reverse function is, I think, particularly ugly, but is it possible to do without ?)
Please, do not hesitate to correct me if anything I have said is wrong, and thank you for your time.
You're confusing mutability with data structure. It is a proper list — just not one you're allowed to modify. Haskell is purely functional, meaning values are constant — you can't change an item in a list any more than you could turn the number 2 into 3. Instead, you perform calculations to create new values with the changes you desire.
You could define that function most simply this way:
add ls idx el = take idx ls ++ el : drop idx ls
The list el : drop idx ls reuses the tail of the original list, so you only have to generate a new list up to idx (which is what the take function does). If you want to do it using explicit recursion, you could define it like so:
add ls 0 el = el : ls
add (x:xs) idx el
| idx < 0 = error "Negative index for add"
| otherwise = x : add xs (idx - 1) el
add [] _ el = [el]
This reuses the tail of the list in the same way (that's the el : ls in the first case).
Since you seem to be having trouble seeing how this is a linked list, let's be clear about what a linked list is: It's a data structure consisting of cells, where each cell has a value and a reference to the next item. In C, it might be defined as:
struct ListCell {
void *value; /* This is the head */
struct ListCell *next; /* This is the tail */
}
In Lisp, it's defined as (head . tail), where head is the value and tail is the reference to the next item.
In Haskell, it's defined as data [] a = [] | a : [a], where a is the value and [a] is the reference to the next item.
As you can see, these data structures are all equivalent. The only difference is that in C and Lisp, which are not purely functional, the head and tail values are things you can change. In Haskell, you can't change them.
Haskell is a purely functional programming language. This means no change can be done at all.
The lists are non-abstract types, it's just a linked list.
You can think of them defined in this way:
data [a] = a : [a] | []
which is exactly the way a linked list is defined - A head element and (a pointer to) the rest.
Note that this is not different internally - If you want to have more efficient types, use Sequence or Array. (But since no change is allowed, you don't need to actually copy lists in order to distinguish between copies so, which might be a performance gain as opposed to imperative languages)
In Haskell, "data type" and "abstract type" are terms of art:
A "data type" (which is not abstract) has visible value constructors which you can pattern-match on in case expressions or function definitions.
An "abstract type" does not have visible value constructors, so you cannot pattern match on values of the type.
Given a type a, [a] (list of a) is a data type because you can pattern match on the visible constructors cons (written :) and nil (written []). An example of an abstract type would be IO a, which you cannot deconstruct by pattern matching.
Your code might work, but it's definitely not optimal. Take the case where you want to insert an item at index 0. An example:
add [200, 300, 400] [] 0 100
If you follow the derivation for this, you end up with:
add [200, 300, 400] [] 0 100
add [300, 400] (200:100:[]) (-1) 100
add [400] (300:[200, 100]) (-2) 300
add [] (400:[300, 200, 100]) (-3) 400
reverse [400, 300, 200, 100]
[100, 200, 300, 400]
But we are only adding an item to the beginning of the list! Such an operation is simple! It's (:)
add [200, 300, 400] [] 0 100
100:[200, 300, 400]
[100, 200, 300, 400]
Think about how much of the list really needs to be reversed.
You ask about whether the runtime modifies the pointers in the linked list. Because lists in Haskell are immutable, nobody (not even the runtime) modifies the pointers in the linked list. This is why, for example, it is cheap to append an item to the front of a list, but expensive to append an element at the back of a list. When you append an item to the front of the list, you can re-use all of the existing list. But when you append an item at the end, it has to build a completely new linked list. The immutability of data is required in order for operations at the front of a list to be cheap.
Re: adding an element to the end of a List, I'd suggest using the (++) operator and splitAt function:
add xs a n = beg ++ (a : end)
where
(beg, end) = splitAt n xs
The List is a linked-list, but it's read-only. You can't modify a List in place - you instead create a new List structure which has the elements you want. I haven't read it, but this book probably gets at your underlying question.
HTH
The compiler is free to choose any internal representation it wants for a list. And in practice it does actually vary. Clearly the list "[1..]" is not implemented as a classical series of cons cells.
In fact a lazy list is stored as a thunk which evaluates to a cons cell containing the next value and the next thunk (a thunk is basically a function pointer plus the arguments for the function, which gets replaced by the actual value once the function is called). On the other hand if the strictness analyser in the compiler can prove that the entire list will always be evaluated then the compiler does just create the entire list as a series of cons cells.