Why is List.length linear in complexity? - list

I understand that lists are implemented as singly linked so they don't really have a constant structure that you can pin a length on, but each node should know how many nodes till the last element right? There isn't a way to add a node to some existing list and for that node not to be able to determine the length of the list it represents in constant time provided that the existing nodes already have that info.
I can understand why that wouldn't work in Haskell, for example, due to lazyness, but as far as I know F# lists aren't lazy. So, is the problem just in the extra memory overhead?

Seems to me like typical memory vs time performance consideration.
If standard f# list had the implementation You suggest, then it would need much more place in memory (consider one million long list of bools). And everyone using such list would have to deal with it. There would be no simple way to opt out of this other than writing completely new implementation of list.
On the other hand, it seems to be fairly simple to create a new type that would store length of succeeding list with each element basing on F# List. You can implement it on Your own if You need it. Those, who don't need it will use standard implementation.

I don't often find myself needing to know the length of the list, it's not like you need it to exit a for loop like you would with arrays in imperative languages.
For those rare cases when you really need to know the length asap, you can go with Carsten König's suggestion from a comment and make your 'a list into a ('a * int) list, where each node keeps the length of the tail as a tuple element.
Then you can have something like this:
let push lst e =
match lst with
| (_, c)::_ -> (e, c + 1) :: lst
| [] -> [e, 0]
and length and pop functions to match.
For all the other cases I'd call it a premature optimization.

Related

OCaml: list # [x] vs x :: list [duplicate]

I've recently started learning scala, and I've come across the :: (cons) function, which prepends to a list.
In the book "Programming in Scala" it states that there is no append function because appending to a list has performance o(n) whereas prepending has a performance of o(1)
Something just strikes me as wrong about that statement.
Isn't performance dependent on implementation? Isn't it possible to simply implement the list with both forward and backward links and store the first and last element in the container?
The second question I suppose is what I'm supposed to do when I have a list, say 1,2,3 and I want to add 4 to the end of it?
The key is that x :: somelist does not mutate somelist, but instead creates a new list, which contains x followed by all elements of somelist. This can be done in O(1) time because you only need to set somelist as the successor of x in the newly created, singly linked list.
If doubly linked lists were used instead, x would also have to be set as the predecessor of somelist's head, which would modify somelist. So if we want to be able to do :: in O(1) without modifying the original list, we can only use singly linked lists.
Regarding the second question: You can use ::: to concatenate a single-element list to the end of your list. This is an O(n) operation.
List(1,2,3) ::: List(4)
Other answers have given good explanations for this phenomenon. If you are appending many items to a list in a subroutine, or if you are creating a list by appending elements, a functional idiom is to build up the list in reverse order, cons'ing the items on the front of the list, then reverse it at the end. This gives you O(n) performance instead of O(n²).
Since the question was just updated, it's worth noting that things have changed here.
In today's Scala, you can simply use xs :+ x to append an item at the end of any sequential collection. (There is also x +: xs to prepend. The mnemonic for many of Scala's 2.8+ collection operations is that the colon goes next to the collection.)
This will be O(n) with the default linked implementation of List or Seq, but if you use Vector or IndexedSeq, this will be effectively constant time. Scala's Vector is probably Scala's most useful list-like collection—unlike Java's Vector which is mostly useless these days.
If you are working in Scala 2.8 or higher, the collections introduction is an absolute must read.
Prepending is faster because it only requires two operations:
Create the new list node
Have that new node point to the existing list
Appending requires more operations because you have to traverse to the end of the list since you only have a pointer to the head.
I've never programmed in Scala before, but you could try a List Buffer
Most functional languages prominently figure a singly-linked-list data structure, as it's a handy immutable collection type. When you say "list" in a functional language, that's typically what you mean (a singly-linked list, usually immutable). For such a type, append is O(n) whereas cons is O(1).

Correct way to add an element to the end of a list?

I was reading on this Haskell page about adding an element to the end of a List.
Using the example, I tried it out for my self. Given the following List I wanted to add the number 56 at the end of it.
Example:
let numbers = [4,8,15,16,23,42]
numbers ++ [56]
I was thrown off by this comment:
Adding an item to the end of a list is a fine exercise, but usually
you shouldn't do it in real Haskell programs. It's expensive, and
indicates you are building your list in the wrong order. There is
usually a better approach.
Researching, I realize that what I'm actually doing is creating a List with 56 as the only element and I'm combining it with the numbers list. Is that correct?
Is using ++ the correct way to add an element to the end of a List?
++ [x] is the correct way to add an element to the end of a list, but what the comment is saying is that you shouldn't add elements to the end of a list.
Due to the way lists are defined, adding an element at the end always requires making a copy of the list. That is,
xs ++ ys
needs to copy all of xs but can reuse ys unchanged.
If xs is just one element (i.e. we're adding to the beginning of a list), that's no problem: Copying one element takes practically no time at all.
But if xs is longer, we need to spend more time in ++.
And if we're doing this repeatedly (i.e. we're building up a big list by continually adding elements to the end), then we need to spend a lot of time making redundant copies. (Building an n-element list in this way is an O(n2) operation.)
If you need to do this, there is usually a better way to structure your algorithm. For example, you can build your list in reverse order (adding elements at the beginning) and only call reverse at the end.
It's the correct way in that all ways of doing it must reduce to at least that much work. The problem is wanting to append to the end of a list at all. That's not an operation that's possible to do efficiently with immutable linked lists.
The better approach is figuring out how to solve your specific problem without doing that. There are a lot of potential approaches. Picking the right one depends on the details of what you're doing. Maybe you can get away with just using laziness correctly. Maybe you are best off generating the list backwards and then reversing it once at the end. Maybe you're best off using a different data structure. It all depends on your specific use case.

About lists:suffix/2 in Erlang

The source code:
suffix(Suffix, List) ->
Delta = length(List) - length(Suffix),
Delta >= 0 andalso nthtail(Delta, List) =:= Suffix.
How about rewriting it like the follow:
suffix(Suffix, List) ->
prefix(reverse(Suffix), reverse(List)).
If Delta >=0, the first one will traverse four times, and the second one will traverse three times, is it correct?
The first one (from stdlib lists.erl) will traverse both lists twice each, yes. On the other hand, on the second traversal all the list cells will probably be in L2 cache, and it doesn't have to allocate any data. Your suggestion works too, but has to build two reversed temporary lists on the heap, which both has a cost in allocating and initializing data structures as well as causing garbage collection to happen more often on average.
If you think about the same problem in C (or any similar language): testing whether one singly linked list is a suffix of another singly linked list, it becomes more obvious why it's hard to do efficiently, in particular if you want to avoid allocating memory, and you aren't allowed to use tricks like reversing pointers.
I don't think it is correct. As far as I know, length is a build in function which does not need to traverse the list to get the result (it is the reason why it is allowed in guard test), and the andalso is a kind of shortcut. if the first term is false, it does not evaluate the second term and directly return false.

Inserting into a list at a specific location using lenses

I'm trying to perform a manipulation of a nested data structure containing lists of elements. After mucking around with various approaches I finally settled on lenses as the best way to go about doing this. They work perfectly for finding and modifying specific elements of the structure, but so far I'm stumped on how to add new elements.
From what I've read, I can't technically use a Traversal as it violates the Traversal laws to insert a new element into a list, and that's assuming I could even figure out how to do that using a Traversal in the first place (I'm still fairly weak with Haskell, and the type signatures for most things in the lens package make my head spin).
Specifically what I'm trying to accomplish is, find some element in the list of elements that matches a specific selector, and then insert a new element either before, or after the matched element (different argument to the function for either before or after the match). Does Control.Lens already have something that can accomplish what I'm trying to do and my understanding of the type signatures is just too weak to see it? Is there a better way to accomplish what I'm trying to do?
It would be fairly trivial if I was just trying to add a new element either to the beginning or the end of a list, but inserting it somewhere specific in the middle is the difficult part. In some of the pre-lens code I wrote I used a fold to accomplish what I wanted, but it was starting to get gnarly on the more deeply nested parts of the structure (E.G. a fold inside of a fold inside of a fold) so I turned to Control.Lens to try to untangle some of that mess.
Using lens pacakge
If we start with knowing the function id can be used like a lens:
import Control.Lens
> [1,2,3,4] ^. id
[1,2,3,4]
Then we can move on to how the list can be modified:
> [1,2,3,4] & id %~ (99:)
[99,1,2,3,4]
The above allows for insertion at the start of the list. To focus on the latter parts of the list we can use _tail from the Control.Lens.Cons module.
> [1,2,3,4] ^. _tail
[2,3,4]
> [1,2,3,4] & _tail %~ (99:)
[1,99,2,3,4]
Now to generalize this for the nth position
> :{
let
_drop 0 = id
_drop n = _tail . _drop (n - 1)
:}
> [1,2,3,4] ^. _drop 1
[2,3,4]
> [1,2,3,4] & _drop 0 %~ (99:)
[99,1,2,3,4]
> [1,2,3,4] & _drop 1 %~ (99:)
[1,99,2,3,4]
One last step to generalize this over all types with a Cons instance we can use cons or <|.
> [1,2,3,4] & _drop 1 %~ (99<|)
[1,99,2,3,4]
> import Data.Text
> :set -XOverloadedStrings
> ("h there"::Text) & _drop 1 %~ ('i'<|)
"hi there"
I think a simple approach would be break down the problem in:
A function that is of [a] -> SomeAddtionalData -> [a], which is basically responsible to transform the list into another list using some specific data. This is where you add/remove elements from the list and get a new list
Use lense to extract the List from some nested data structure, pass that list to above defined function, set the returned list in the nested data structure using lense.
Your last paragraph is the indication about what happens when you try to do too much using a generic abstraction like Lens. These generic abstractions are good for some generic purpose and everything else is specific to your problem and should be designed around plain old functions (at least initially, later on in your project you may find some general pattern across your code base which can be abstracted using type classes etc.).
Some comments on your problem:
Answer the Question:
There may be a way to do what you want to do. The Lens library is amazingly generic. What there is not is a simple or obvious way to make it happen. I think the it will involve the partsOf combinator but I'm not sure.
Comments on Lenses:
The lens library is really cool and can apply to an amazing number of problems. My initial temptation as I am learning the library was to try to fit everything into a Lens access or mutation. What I discovered was that it was better to use the lens library to dig into my complex data structures, but once I had a simple element it was better to use the more traditional functional techniques I already knew rather then stretching the Lens library past it's useful limit.
Advice you didn't ask for:
Inserting an element into the middle of a list is a bad idea. Not that it cannot be done but it can end up being an O(n^2) operation. (See also this StackOverflow answer.)Zip lists or some other functional data structure may be a better idea. As a side benefit, some of these structures could be made instance of the At class allowing for insertion and deletion using the partial lens combinators.

Erlang lists:index_of/2 function?

I'm looking for an Erlang library function that will return the index of a particular element in a list.
So, if
X = [10,30,50,70]
lists:index_of(30, X)
would return 1, etc., just like java.util.List's indexOf() method.
Does such a method exist in the Erlang standard lib? I tried looking in the lists module but no luck. Or should I write it myself?
You'll have to define it yourself, like this:
index_of(Item, List) -> index_of(Item, List, 1).
index_of(_, [], _) -> not_found;
index_of(Item, [Item|_], Index) -> Index;
index_of(Item, [_|Tl], Index) -> index_of(Item, Tl, Index+1).
Note however that accesing the Nth element of a list is O(N), so an algorithm that often accesses a list by index will be less efficient than one that iterates through it sequentially.
As others noted, there are more efficient ways to solve for this. But if you're looking for something quick, this worked for me:
string:str(List, [Element]).
Other solutions (remark that these are base-index=1):
index_of(Value, List) ->
Map = lists:zip(List, lists:seq(1, length(List))),
case lists:keyfind(Value, 1, Map) of
{Value, Index} -> Index;
false -> notfound
end.
index_of(Value, List) ->
Map = lists:zip(List, lists:seq(1, length(List))),
case dict:find(Value, dict:from_list(Map)) of
{ok, Index} -> Index;
error -> notfound
end.
At some point, when the lists you pass to these functions get long enough, the overhead of constructing the additional list or dict becomes too expensive. If you can avoid doing the construction every time you want to search the list by keeping the list in that format outside of these functions, you eliminate most of the overhead.
Using a dictionary will hash the values in the list and help reduce the index lookup time to O(log N), so it's better to use that for large, singly-keyed lists.
In general, it's up to you, the programmer, to organize your data into structures that suit how you're going to use them. My guess is that the absence of a built-in index_of is to encourage such consideration. If you're doing single-key lookups -- that's really what index_of() is -- use a dictionary. If you're doing multi-key lookups, use a list of tuples with lists:keyfind() et al. If your lists are inordinately large, a less simplistic solution is probably best.
This function is very uncommon for Erlang and this is may be reason why it is not in standard library. No one of experienced Erlang programmers need it and is discourage to use algorithms using this function. When someone needs it, can write for own purpose but this very rare occasions are not reason to include it to stdlib. Design your data structures in proper way instead of ask for this function. In most cases need of this function indicates error in design.
I think the writer makes a valid case. Here is my use case from a logging application. The objective is to check the severity of an error against the actions to be performed against various levels of error response.
get_index(A,L) ->
get_index(A,L,1).
get_index(A,[A|_],N) ->
N;
get_index(A,[_|T],N) ->
get_index(A,T,N+1).
get_severity(A) ->
Severity=[debug,info,warn,error],
get_index(A,Severity).
The following function returns a list of indices of a given element in a list. Result can be used to get the index of the first or last occurrence of a duplicate element in a list.
indices_of(Element, L) ->
Indices = lists:zip(lists:seq(1,length(L)), L),
[ I || {I, E} <- Indices, E == Element ].