Prepend vs. Append perf in Mathematica - list

in Lisp-like systems, cons is the normal way to PREPEND an element to a list. Functions that append to a list are much more expensive because they walk the list to the end and then replace the final null with a reference to the appended item. IOW (pseudoLisp)
(prepend list item) = (cons item list) = cheap!
(append list item) = (cond ((null? list) (cons item null))
(#t (cons (car list (append (cdr list) item)))))
Question is whether the situation is similar in Mathemtica? In most regards, Mathematica's lists seem to be singly-linked like lisp's lists, and, if so, we may presume that Append[list,item] is much more expensive than Prepend[list,item]. However, I wasn't able to find anything in the Mathematica documentation to address this question. If Mathematica's lists are doubly-linked or implemented more cleverly, say, in a heap or just maintaining a pointer-to-last, then insertion may have a completely different performance profile.
Any advice or experience would be appreciated.

Mathematica's lists are not singly linked lists like in Common Lisp. It is better to think of mathematica lists as array or vector like structures. The speed of insertion is O(n), but the speed of retrieval is constant.
Check out this page of Data structures and Efficient Algorithms in Mathematica which covers mathematica lists in further detail.
Additionally please check out this Stack Overflow question on linked lists and their performance in mathematica.

As a small add on, here is an efficient alternative to "AppendTo" in M-
myBag = Internal`Bag[]
Do[Internal`StuffBag[myBag, i], {i, 10}]
Internal`BagPart[myBag, All]

Since, as already mentioned, Mathematica lists are implemented as arrays, operations like Append and Prepend cause the list to be copied every time an element is added. A more efficient method is to preallocate a list and fill it, however my experiment below didn't show as great a difference as I expected. Better still, apparently, is the linked-list method, which I shall have to investigate.
Needs["PlotLegends`"]
test[n_] := Module[{startlist = Range[1000]},
datalist = RandomReal[1, n*1000];
appendlist = startlist;
appendtime =
First[AbsoluteTiming[AppendTo[appendlist, #] & /# datalist]];
preallocatedlist = Join[startlist, Table[Null, {Length[datalist]}]];
count = -1;
preallocatedtime =
First[AbsoluteTiming[
Do[preallocatedlist[[count]] = datalist[[count]];
count--, {Length[datalist]}]]];
{{n, appendtime}, {n, preallocatedtime}}];
results = test[#] & /# Range[26];
ListLinePlot[Transpose[results], Filling -> Axis,
PlotLegend -> {"Appending", "Preallocating"},
LegendPosition -> {1, 0}]
Timing chart comparing AppendTo against preallocating. (Run time: 82 seconds)
Edit
Using nixeagle's suggested modification improved the preallocation timing considerably, i.e. with preallocatedlist = Join[startlist, ConstantArray[0, {Length[datalist]}]];
Second Edit
A linked-list of the form {{{startlist},data1},data2} works even better, and has the great advantage that the size does not need to be known in advance, as it does for preallocating.
Needs["PlotLegends`"]
test[n_] := Module[{startlist = Range[1000]},
datalist = RandomReal[1, n*1000];
linkinglist = startlist;
linkedlisttime =
First[AbsoluteTiming[
Do[linkinglist = {linkinglist, datalist[[i]]}, {i,
Length[datalist]}];
linkedlist = Flatten[linkinglist];]];
preallocatedlist =
Join[startlist, ConstantArray[0, {Length[datalist]}]];
count = -1;
preallocatedtime =
First[AbsoluteTiming[
Do[preallocatedlist[[count]] = datalist[[count]];
count--, {Length[datalist]}]]];
{{n, preallocatedtime}, {n, linkedlisttime}}];
results = test[#] & /# Range[26];
ListLinePlot[Transpose[results], Filling -> Axis,
PlotLegend -> {"Preallocating", "Linked-List"},
LegendPosition -> {1, 0}]
Timing comparison of linked-list vs preallocating. (Run time: 6 seconds)

If you know how many elements your result will have and if you can calculate your elements, then the whole Append, AppendTo, Linked-List, etc is not necessary. In the speed-test of Chris, the preallocation only works, because he knows the number of elements in advance. The access operation to datelist stands for the virtual calculation of the current element.
If the situation is like that, I would never use such an approach. A simple Table combined with a Join is the hell faster. Let me reuse Chris' code: I add the preallocation to the time measurement, because when using Append or the linked list, the memory allocation is measured too. Furthermore, I really use the resulting lists and check wether they are equal, because a clever interpreter maybe would recognize simple, useless commands an optimize these out.
Needs["PlotLegends`"]
test[n_] := Module[{
startlist = Range[1000],
datalist, joinResult, linkedResult, linkinglist, linkedlist,
preallocatedlist, linkedlisttime, preallocatedtime, count,
joinTime, preallocResult},
datalist = RandomReal[1, n*1000];
linkinglist = startlist;
{linkedlisttime, linkedResult} =
AbsoluteTiming[
Do[linkinglist = {linkinglist, datalist[[i]]}, {i,
Length[datalist]}];
linkedlist = Flatten[linkinglist]
];
count = -1;
preallocatedtime = First#AbsoluteTiming[
(preallocatedlist =
Join[startlist, ConstantArray[0, {Length[datalist]}]];
Do[preallocatedlist[[count]] = datalist[[count]];
count--, {Length[datalist]}]
)
];
{joinTime, joinResult} =
AbsoluteTiming[
Join[startlist,
Table[datalist[[i]], {i, 1, Length[datalist]}]]];
PrintTemporary[
Equal ### Tuples[{linkedResult, preallocatedlist, joinResult}, 2]];
{preallocatedtime, linkedlisttime, joinTime}];
results = test[#] & /# Range[40];
ListLinePlot[Transpose[results], PlotStyle -> {Black, Gray, Red},
PlotLegend -> {"Prealloc", "Linked", "Joined"},
LegendPosition -> {1, 0}]
In my opinion, the interesting situations are, when you don't know the number of elements in advance and you have to decide ad hoc whether or not you have to append/prepend something. In those cases Reap[] and Sow[] maybe worth a look. In general I would say, AppendTo is evil and before using it, have a look at the alternatives:
n = 10.^5 - 1;
res1 = {};
t1 = First#AbsoluteTiming#Table[With[{y = Sin[x]},
If[y > 0, AppendTo[res1, y]]], {x, 0, 2 Pi, 2 Pi/n}
];
{t2, res2} = AbsoluteTiming[With[{r = Release#Table[
With[{y = Sin[x]},
If[y > 0, y, Hold#Sequence[]]], {x, 0, 2 Pi, 2 Pi/n}]},
r]];
{t3, res3} = AbsoluteTiming[Flatten#Table[
With[{y = Sin[x]},
If[y > 0, y, {}]], {x, 0, 2 Pi, 2 Pi/n}]];
{t4, res4} = AbsoluteTiming[First#Last#Reap#Table[With[{y = Sin[x]},
If[y > 0, Sow[y]]], {x, 0, 2 Pi, 2 Pi/n}]];
{res1 == res2, res2 == res3, res3 == res4}
{t1, t2, t3, t4}
Gives {5.151575, 0.250336, 0.128624, 0.148084}. The construct
Flatten#Table[ With[{y = Sin[x]}, If[y > 0, y, {}]], ...]
is luckily really readable and fast.
Remark
Be careful trying this last example at home. Here, on my Ubuntu 64bit and Mma 8.0.4 the AppendTo with n=10^5 takes 10GB of Memory. n=10^6 takes all of my RAM which is 32GB to create an array containing 15MB of data. Funny.

Related

Smallest sub-list that contains all numbers

I am trying to write a program in sml that takes in the length of a list, the max number that will appear on the list and the list of course. It then calculates the length of the smallest "sub-list" that contains all numbers.
I have tried to use the sliding window approach , with two indexes , front and tail. The front scans first and when it finds a number it writes into a map how many times it has already seen this number. If the program finds all numbers then it calls the tail. The tail scans the list and if it finds that a number has been seen more times than 1 it takes it off.
The code I have tried so far is the following:
structure Key=
struct
type ord_key=int
val compare=Int.compare
end
fun min x y = if x>y then y else x;
structure mymap = BinaryMapFn ( Key );
fun smallest_sub(n,t,listall,map)=
let
val k=0
val front=0
val tail=0
val minimum= n;
val list1=listall;
val list2=listall;
fun increase(list1,front,k,ourmap)=
let
val number= hd list1
val elem=mymap.find(ourmap,number)
val per=getOpt(elem,0)+1
fun decrease(list2,tail,k,ourmap,minimum)=
let
val number=hd list2
val elem=mymap.find(ourmap,number)
val per=getOpt(elem,0)-1
val per1=getOpt(elem,0)
in
if k>t then
if (per1=1) then decrease(tl list2,tail+1,k-1,mymap.insert(ourmap,number,per),min minimum (front-tail))
else decrease(tl list2,tail+1,k,mymap.insert(ourmap,number,per),min minimum (front-tail))
else increase (list1, front,k,ourmap)
end
in
if t>k then
if (elem<>NONE) then increase (tl list1,front+1,k,mymap.insert(ourmap,number,per))
else increase(tl list1,front+1,k+1,mymap.insert(ourmap,number,per))
else (if (n>front) then decrease(list2,tail,k,ourmap,minimum) else minimum)
end
in
increase(list1,front,k,map)
end
fun solve (n,t,acc)= smallest_sub(n,t,acc,mymap.empty)
But when I call it with this smallest_sub(10,3,[1,3,1,3,1,3,3,2,2,1]); it does not work. What have I done wrong??
Example: if input is 1,3,1,3,1,3,3,2,2,1 the program should recognize that the parto of the list that contains all numbers and is the smallest is 1,3,3,2 and 3,2,2,1 so the output should be 4
This problem of "smallest sub-list that contains all values" seems to recur in
new questions without a successful answer. This is because it's not a minimal,
complete, and verifiable example.
Because you use a "sliding window" approach, indexing the front and the back
of your input, a list taking O(n) time to index elements is not ideal. You
really do want to use arrays here. If your input function must have a list, you
can convert it to an array for the purpose of the algorithm.
I'd like to perform a cleanup of the code before answering, because running
your current code by hand is a bit hard because it's so condensed. Here's an
example of how you could abstract out the book-keeping of whether a given
sub-list contains at least one copy of each element in the original list:
Edit: I changed the code below after originally posting it.
structure CountMap = struct
structure IntMap = BinaryMapFn(struct
type ord_key = int
val compare = Int.compare
end)
fun count (m, x) =
Option.getOpt (IntMap.find (m, x), 0)
fun increment (m, x) =
IntMap.insert (m, x, count (m, x) + 1)
fun decrement (m, x) =
let val c' = count (m, x)
in if c' <= 1
then NONE
else SOME (IntMap.insert (m, x, c' - 1))
end
fun flip f (x, y) = f (y, x)
val fromList = List.foldl (flip increment) IntMap.empty
end
That is, a CountMap is an int IntMap.map where the Int represents the
fixed key type of the map, being int, and the int parameter in front of it
represents the value type of the map, being a count of how many times this
value occurred.
When building the initialCountMap below, you use CountMap.increment, and
when you use the "sliding window" approach, you use CountMap.decrement to
produce a new countMap that you can test on recursively.
If you decrement the occurrence below 1, you're looking at a sub-list that
doesn't contain every element at least once; we rule out any solution by
letting CountMap.decrement return NONE.
With all of this machinery abstracted out, the algorithm itself becomes much
easier to express. First, I'd like to convert the list to an array so that
indexing becomes O(1), because we'll be doing a lot of indexing.
fun smallest_sublist_length [] = 0
| smallest_sublist_length (xs : int list) =
let val arr = Array.fromList xs
val initialCountMap = CountMap.fromList xs
fun go countMap i j =
let val xi = Array.sub (arr, i)
val xj = Array.sub (arr, j)
val decrementLeft = CountMap.decrement (countMap, xi)
val decrementRight = CountMap.decrement (countMap, xj)
in
case (decrementLeft, decrementRight) of
(SOME leftCountMap, SOME rightCountMap) =>
Int.min (
go leftCountMap (i+1) j,
go rightCountMap i (j-1)
)
| (SOME leftCountMap, NONE) => go leftCountMap (i+1) j
| (NONE, SOME rightCountMap) => go rightCountMap i (j-1)
| (NONE, NONE) => j - i + 1
end
in
go initialCountMap 0 (Array.length arr - 1)
end
This appears to work, but...
Doing Int.min (go left..., go right...) incurs a cost of O(n^2) stack
memory (in the case where you cannot rule out either being optimal). This is a
good use-case for dynamic programming because your recursive sub-problems have a
common sub-structure, i.e.
go initialCountMap 0 10
|- go leftCountMap 1 10
| |- ...
| `- go rightCountMap 1 9 <-.
`- go rightCountMap 0 9 | possibly same sub-problem!
|- go leftCountMap 1 9 <-'
`- ...
So maybe there's a way to store the recursive sub-problem in a memory array and not
perform a recursive lookup if you know the result to this sub-problem. How to
do memoization in SML is a good question in and of itself. How to do purely
functional memoization in a non-lazy language is an even better one.
Another optimization you could make is that if you ever find a sub-list the
size of the number of unique elements, you need to look no further. This number
is incidentally the number of elements in initialCountMap, and IntMap
probably has a function for finding it.

Prolog - How to determine if all elements in a string list are equal?

I'm working on this prolog assignment where I must parse an user-inputted list of string characters (specifically "u"), and determine if all the elements are equal to the string "u". If they are, then it returns the number of elements, if not, it returns false. For example:
uA(-Length,+String,+Leftover) //Prototype
?- uA(L,["u","u","u"],[]).
L = 3 .
?- uA(L,["u","u","d"],[]).
false.
I have a decent grasp on how prolog works, but I'm confused about how lists operate. Any help would be greatly appreciated. Thanks!
Edit: I made some headway with the sort function (thank you!) but I've run into a separate problem.
uA(Length, String) :-
sort(String, [_]),
member("u", String),
length(String, Length).
This does mostly what I need it to, however, when I run it:
?- uA(L, ["u", "u", "u"]).
L = 3 ;
L = 3 ;
L = 3.
Is there any way to make it such that it only prints L = 3 once? Thanks!
If you want to state that all list items are equal, there is no need to sort the list first.
Simply use library predicate maplist/2 together with the builtin predicate (=)/2:
?- maplist(=(X), Xs).
Xs = []
; Xs = [X]
; Xs = [X, X]
; Xs = [X, X, X]
; Xs = [X, X, X, X]
… % ... and so on ...
First of all, be careful with double-quoted terms in Prolog. Their interpretation depends on the value of the standard double_quotes flag. The most portable value of this flag is codes, which makes e.g. "123" being interpreted as [49,50,51]. Other possible values of this flag are atom and chars. Some Prolog systems, e.g. SWI-Prolog, also support a string value.
But back to your question. A quick way to check that all elements in a ground list are equal is to use the standard sort/2 predicate (which eliminates duplicated elements). For example:
| ?- sort(["u","u","u"], [_]).
yes
| ?- sort(["u","u","d"], [_]).
no
As [_] unifies with any singleton list, the call only succeeds if the the sorting results in a list with a single element, which only happens for a non-empty ground list if all its elements are equal. Note that this solution is independent of the value of the double_quotes flag. Note also that you need to deal with an empty list separately.
My approach is to check if every element in the list is the same or not (by checking if the head of the list and it's adjacent element is the same or not). If same then return True else false. Then calculate the length of every element is the same in the list.
isEqual([X,Y]):- X == Y , !.
isEqual([H,H1|T]):- H == H1 , isEqual([H1|T]).
len([],0).
len([_|T],L):- len(T,L1) , L is L1+1.
goal(X):- isEqual(X) , len(X,Length) , write('Length = ') , write(Length).
OUTPUT
?- goal(["u","u","u"]).
Length = 3
true
?- goal(["u","u","a"]).
false
you can do it this way. Hope this helps you.

What is the idiomatic way to compare two lists for equality?

I have two lists I need to check whether their elements are equal (not shallow check, for elements I might rely on Kernel.==/2.)
Currently, I have:
[l1 -- l2] ++ [l2 -- l1] == []
It looks a bit overcomplicated and not quite idiomatic to me. Am I missing something? Is there a better way to compare two lists for equality?
The shortest way I can think of is to sort the lists and compare them:
Enum.sort(l1) == Enum.sort(l2)
This will run in O(n log n) time instead of O(n ^ 2) for your Kernel.--/2 based solution.
We can't use a plain Set data structure here since the list can contain duplicates and their counts must be kept track of. We can use a Map which counts the frequency of each element and then to compare them:
iex(1)> l1 = ~w|a a b|a
[:a, :a, :b]
iex(2)> l2 = ~w|a b|a
[:a, :b]
iex(3)> m1 = l1 |> Enum.reduce(%{}, fn x, acc -> Map.update(acc, x, 1, & &1 + 1) end)
%{a: 2, b: 1}
iex(4)> m2 = l2 |> Enum.reduce(%{}, fn x, acc -> Map.update(acc, x, 1, & &1 + 1) end)
%{a: 1, b: 1}
iex(5)> m1 == m2
false
iex(6)> l2 = ~w|a b a|a
[:a, :b, :a]
iex(7)> m2 = l2 |> Enum.reduce(%{}, fn x, acc -> Map.update(acc, x, 1, & &1 + 1) end)
%{a: 2, b: 1}
iex(8)> m1 == m2
true
This is also O(n log n) so you may want to benchmark the two solutions with the kind of data you'll have to see which performs better.
Dogbert's solution is apt, but it is still in Orcish.
In Erlang this would be:
lists:sort(List1) == lists:sort(List2)
A deep comparison looks very nearly the same, but of course must traverse the structures within each list.
One factor to consider is that very often the ordering of a list is its meaning. Strings are a good example, so are player ranks, start positions, game hotkey locations, and last-used-longest-kept cache lists. So do keep in mind the meaning of the comparison: "Do these two lists contain the same number and type of things?" is not the same as "Are these two lists semantically identical?"
Consider comparing anagrams:
lists:sort("AT ERR GET VICE") == lists:sort("IT ERR ETC GAVE")
That works fine as an anagram comparison, but not at all as a semantic one.

F#: Efficiently get last state from List.scan

I'm running List.scan over a very large list in order to compute a running total. When I'm done I need the total in addition to the scan output in order to partition the list non-uniformly. The total is in the last state output by scan, and I would really like to avoid an additional traversal of the list in order to get the final state. The only way I can think to do this is to pass a mutable reference to accumulate the total. Is there a better way to approach this?
let l = <very large list of int64>
let runningTotal=List.scan (fun s x -> x+s) 0L l
let total= <last element of runningTotal- very inefficient>
doSomething total runningTotal
In F# 4.0, List.mapFold is being added, which enables this nicely.
[1;2;3;4] |> List.mapFold (fun state elem -> let nxt = state + elem in (nxt,nxt)) 0
// > val it : int list * int = ([1; 3; 6; 10], 10)
List.last is also added in 4.0, though its perf is still O(n). If you want to pluck the last element from a list in F# 3.1 and earlier, you can do it with fold, but again, this is O(n).
let last lst =
lst |> List.fold (fun _ x -> x) Unchecked.defaultof<_>
#John's solution is probably fastest and simplest.
Here is one way. Since we can define the lambda to do anything, just make it always store the result in a ref cell. Since scan works from start to end, the result will be the last value.
let last = ref 0L
let l = <very large list of int64>
let runningTotal=List.scan (fun s x ->let t = x+s;last=:t;t) 0L l
let total= !last
doSomething total runningTotal
I think actually just accessing the last element in a list, is indeed not possible. That said, you say, your list is very large. When it comes to very large inputs, a list might not be the optimal data structure to begin with. What comes to mind, would be that you certainly could use an array instead of a list in this case. Arrays are also more memory-efficient than lists, because a list will create a reference for each element, which is somewhere around 12 bytes per item. Whereas an array only has a reference to the first element.
If an array works for you, then that would be the solution, as you can access the last element of an array without the O(n) overhead.

How can I fold the nth and (n+1)th elements into a new list in Scala?

Let's say I have List(1,2,3,4,5) and I want to get
List(3,5,7,9), that is, the sums of the element and the previous (1+2, 2+3,3+4,4+5)
I tried to do this by making two lists:
val list1 = List(1,2,3,4)
val list2 = (list1.tail ::: List(0)) // 2,3,4,5,0
for (n0_ <- list1; n1th_ <- list2) yield (n0_ + n1_)
But that combines all the elements with each other like a cross product, and I only want to combine the elements pairwise. I'm new to functional programming and I thought I'd use map() but can't seem to do so.
List(1, 2, 3, 4, 5).sliding(2).map(_.sum).to[List] does the job.
Docs:
def sliding(size: Int): Iterator[Seq[A]]
Groups elements in fixed size blocks by passing a "sliding window" over them (as opposed to partitioning them, as is done in grouped.)
You can combine the lists with zip and use map to add the pairs.
val list1 = List(1,2,3,4,5)
list1.zip(list1.tail).map(x => x._1 + x._2)
res0: List[Int] = List(3, 5, 7, 9)
Personally I think using sliding as Infinity has is the clearest, but if you want to use a zip-based solution then you might want to use the zipped method:
( list1, list1.tail ).zipped map (_+_)
In addition to being arguably clearer than using zip, it is more efficient in that the intermediate data structure (the list of tuples) created by zip is not created with zipped. However, don't use it with infinite streams, or it will eat all of your memory.