Related
I am creating a service that returns lists of articles from an XML feed. I am trying to find a better way to update the list of articles currently stored in my program whenever the client code calls for an update.
I have a variable named articleHistory which is the list of articles I want to update. I have the new list of articles, aptly named newArticles, which my article retrieval function returns. There will be articles which are present in both lists. But the newArticles list will contain articles which are not in the articleHistory list. I am currently using a temporary variable, appending newArticles to articleHistory, and then returning that after calling the distinct method.
def updateArticleHistory: List[Article] = {
val newArticles = getArticles
val temp = newArticles ::: articleHistory
articleHistory = temp.distinct
}
Assume there is a case class name Article available. I feel there has to be a better way to do this, but I can't figure it out.
It seems like what you want is ListSet. A ListSet is a Set (since it only allows one of each element) but is maintains the insertion order. The one weird thing is that since it inserts elements at the head, the list is backwards.
import collection.immutable.ListSet
val a = ListSet.empty[Int]
val b = a ++ List(3,4,5) // ListSet(5, 4, 3)
val c = b ++ List(1,2,3,4,5,6) // ListSet(6, 2, 1, 5, 4, 3)
c.toList.reverse // List(3, 4, 5, 1, 2, 6)
I don't know, but why you can't do smth like this:
val lstA: List[Int] = List[Int](1, 2, 3, 4)
val lstB: List[Int] = List[Int](5, 6, 7, 1, 2)
println(lstA ::: lstB distinct)
>> List(1, 2, 3, 4, 5, 6, 7)
I think it's nice:)
Let's say I have List(1,2,3,4,5) and I want to get
List(3,5,7,9), that is, the sums of the element and the previous (1+2, 2+3,3+4,4+5)
I tried to do this by making two lists:
val list1 = List(1,2,3,4)
val list2 = (list1.tail ::: List(0)) // 2,3,4,5,0
for (n0_ <- list1; n1th_ <- list2) yield (n0_ + n1_)
But that combines all the elements with each other like a cross product, and I only want to combine the elements pairwise. I'm new to functional programming and I thought I'd use map() but can't seem to do so.
List(1, 2, 3, 4, 5).sliding(2).map(_.sum).to[List] does the job.
Docs:
def sliding(size: Int): Iterator[Seq[A]]
Groups elements in fixed size blocks by passing a "sliding window" over them (as opposed to partitioning them, as is done in grouped.)
You can combine the lists with zip and use map to add the pairs.
val list1 = List(1,2,3,4,5)
list1.zip(list1.tail).map(x => x._1 + x._2)
res0: List[Int] = List(3, 5, 7, 9)
Personally I think using sliding as Infinity has is the clearest, but if you want to use a zip-based solution then you might want to use the zipped method:
( list1, list1.tail ).zipped map (_+_)
In addition to being arguably clearer than using zip, it is more efficient in that the intermediate data structure (the list of tuples) created by zip is not created with zipped. However, don't use it with infinite streams, or it will eat all of your memory.
in Lisp-like systems, cons is the normal way to PREPEND an element to a list. Functions that append to a list are much more expensive because they walk the list to the end and then replace the final null with a reference to the appended item. IOW (pseudoLisp)
(prepend list item) = (cons item list) = cheap!
(append list item) = (cond ((null? list) (cons item null))
(#t (cons (car list (append (cdr list) item)))))
Question is whether the situation is similar in Mathemtica? In most regards, Mathematica's lists seem to be singly-linked like lisp's lists, and, if so, we may presume that Append[list,item] is much more expensive than Prepend[list,item]. However, I wasn't able to find anything in the Mathematica documentation to address this question. If Mathematica's lists are doubly-linked or implemented more cleverly, say, in a heap or just maintaining a pointer-to-last, then insertion may have a completely different performance profile.
Any advice or experience would be appreciated.
Mathematica's lists are not singly linked lists like in Common Lisp. It is better to think of mathematica lists as array or vector like structures. The speed of insertion is O(n), but the speed of retrieval is constant.
Check out this page of Data structures and Efficient Algorithms in Mathematica which covers mathematica lists in further detail.
Additionally please check out this Stack Overflow question on linked lists and their performance in mathematica.
As a small add on, here is an efficient alternative to "AppendTo" in M-
myBag = Internal`Bag[]
Do[Internal`StuffBag[myBag, i], {i, 10}]
Internal`BagPart[myBag, All]
Since, as already mentioned, Mathematica lists are implemented as arrays, operations like Append and Prepend cause the list to be copied every time an element is added. A more efficient method is to preallocate a list and fill it, however my experiment below didn't show as great a difference as I expected. Better still, apparently, is the linked-list method, which I shall have to investigate.
Needs["PlotLegends`"]
test[n_] := Module[{startlist = Range[1000]},
datalist = RandomReal[1, n*1000];
appendlist = startlist;
appendtime =
First[AbsoluteTiming[AppendTo[appendlist, #] & /# datalist]];
preallocatedlist = Join[startlist, Table[Null, {Length[datalist]}]];
count = -1;
preallocatedtime =
First[AbsoluteTiming[
Do[preallocatedlist[[count]] = datalist[[count]];
count--, {Length[datalist]}]]];
{{n, appendtime}, {n, preallocatedtime}}];
results = test[#] & /# Range[26];
ListLinePlot[Transpose[results], Filling -> Axis,
PlotLegend -> {"Appending", "Preallocating"},
LegendPosition -> {1, 0}]
Timing chart comparing AppendTo against preallocating. (Run time: 82 seconds)
Edit
Using nixeagle's suggested modification improved the preallocation timing considerably, i.e. with preallocatedlist = Join[startlist, ConstantArray[0, {Length[datalist]}]];
Second Edit
A linked-list of the form {{{startlist},data1},data2} works even better, and has the great advantage that the size does not need to be known in advance, as it does for preallocating.
Needs["PlotLegends`"]
test[n_] := Module[{startlist = Range[1000]},
datalist = RandomReal[1, n*1000];
linkinglist = startlist;
linkedlisttime =
First[AbsoluteTiming[
Do[linkinglist = {linkinglist, datalist[[i]]}, {i,
Length[datalist]}];
linkedlist = Flatten[linkinglist];]];
preallocatedlist =
Join[startlist, ConstantArray[0, {Length[datalist]}]];
count = -1;
preallocatedtime =
First[AbsoluteTiming[
Do[preallocatedlist[[count]] = datalist[[count]];
count--, {Length[datalist]}]]];
{{n, preallocatedtime}, {n, linkedlisttime}}];
results = test[#] & /# Range[26];
ListLinePlot[Transpose[results], Filling -> Axis,
PlotLegend -> {"Preallocating", "Linked-List"},
LegendPosition -> {1, 0}]
Timing comparison of linked-list vs preallocating. (Run time: 6 seconds)
If you know how many elements your result will have and if you can calculate your elements, then the whole Append, AppendTo, Linked-List, etc is not necessary. In the speed-test of Chris, the preallocation only works, because he knows the number of elements in advance. The access operation to datelist stands for the virtual calculation of the current element.
If the situation is like that, I would never use such an approach. A simple Table combined with a Join is the hell faster. Let me reuse Chris' code: I add the preallocation to the time measurement, because when using Append or the linked list, the memory allocation is measured too. Furthermore, I really use the resulting lists and check wether they are equal, because a clever interpreter maybe would recognize simple, useless commands an optimize these out.
Needs["PlotLegends`"]
test[n_] := Module[{
startlist = Range[1000],
datalist, joinResult, linkedResult, linkinglist, linkedlist,
preallocatedlist, linkedlisttime, preallocatedtime, count,
joinTime, preallocResult},
datalist = RandomReal[1, n*1000];
linkinglist = startlist;
{linkedlisttime, linkedResult} =
AbsoluteTiming[
Do[linkinglist = {linkinglist, datalist[[i]]}, {i,
Length[datalist]}];
linkedlist = Flatten[linkinglist]
];
count = -1;
preallocatedtime = First#AbsoluteTiming[
(preallocatedlist =
Join[startlist, ConstantArray[0, {Length[datalist]}]];
Do[preallocatedlist[[count]] = datalist[[count]];
count--, {Length[datalist]}]
)
];
{joinTime, joinResult} =
AbsoluteTiming[
Join[startlist,
Table[datalist[[i]], {i, 1, Length[datalist]}]]];
PrintTemporary[
Equal ### Tuples[{linkedResult, preallocatedlist, joinResult}, 2]];
{preallocatedtime, linkedlisttime, joinTime}];
results = test[#] & /# Range[40];
ListLinePlot[Transpose[results], PlotStyle -> {Black, Gray, Red},
PlotLegend -> {"Prealloc", "Linked", "Joined"},
LegendPosition -> {1, 0}]
In my opinion, the interesting situations are, when you don't know the number of elements in advance and you have to decide ad hoc whether or not you have to append/prepend something. In those cases Reap[] and Sow[] maybe worth a look. In general I would say, AppendTo is evil and before using it, have a look at the alternatives:
n = 10.^5 - 1;
res1 = {};
t1 = First#AbsoluteTiming#Table[With[{y = Sin[x]},
If[y > 0, AppendTo[res1, y]]], {x, 0, 2 Pi, 2 Pi/n}
];
{t2, res2} = AbsoluteTiming[With[{r = Release#Table[
With[{y = Sin[x]},
If[y > 0, y, Hold#Sequence[]]], {x, 0, 2 Pi, 2 Pi/n}]},
r]];
{t3, res3} = AbsoluteTiming[Flatten#Table[
With[{y = Sin[x]},
If[y > 0, y, {}]], {x, 0, 2 Pi, 2 Pi/n}]];
{t4, res4} = AbsoluteTiming[First#Last#Reap#Table[With[{y = Sin[x]},
If[y > 0, Sow[y]]], {x, 0, 2 Pi, 2 Pi/n}]];
{res1 == res2, res2 == res3, res3 == res4}
{t1, t2, t3, t4}
Gives {5.151575, 0.250336, 0.128624, 0.148084}. The construct
Flatten#Table[ With[{y = Sin[x]}, If[y > 0, y, {}]], ...]
is luckily really readable and fast.
Remark
Be careful trying this last example at home. Here, on my Ubuntu 64bit and Mma 8.0.4 the AppendTo with n=10^5 takes 10GB of Memory. n=10^6 takes all of my RAM which is 32GB to create an array containing 15MB of data. Funny.
Ok, I wouldn't be coming to You for help if I knew what to do, anyways, still having problems with my "program".
class Mark(val name: String, val style_mark: Double, val other_mark: Double) {}
object Test extends Application
{
val m1 = new Mark("Smith", 18, 16);
val m2 = new Mark("Cole", 14, 7);
val m3 = new Mark("James", 13, 15);
val m4 = new Mark("Jones", 14, 16);
val m5 = new Mark("Richardson", 20, 19);
val m6 = new Mark("James", 4, 18);
val marks = List[Mark](m1, m2, m3, m4, m5, m6);
def avg(xs: List[Double]) = xs.sum / xs.length
val marksGrouped = marks.groupBy(_.name).map { kv => new Mark(kv._1, avg(kv._2.map(_.style_mark)), avg(kv._2.map(_.other_mark))) }
val marksSorted = marksGrouped.sortWith((m1, m2) => m1._style_mark < m2._style_mark)
}
And this is the error I get: error: value sortWith is not a member of scala.collection.immutable.Iterable[Mark]
You'll have to call toList on marksGrouped first. Iterable does not have a sortWith method, but List does.
Basic collection hierarchy:
TraversableOnce: might only be traversable once, like iterators.
Traversable: May be traversed, with foreach, but no other guarantees provided.
Iterable: Can produce iterator, which enables lazy traversal.
Seq: contents have a fixed traversal order.
IndexedSeq: contents can be efficiently retrieved by position number.
Set: contents only contain one element of each type.
Map: contents can be efficiently retrieved by a key.
So the problem you face is that Iterable does not provide support to define the traversal order, which is what sortWith does. Only collections derived from Seq can -- List, Vector, ArrayBuffer, etc.
The method toSeq will return a Seq out of an Iterable. Or you may choose a more specific collection, with characteristics well matched to your algorithm.
For extra higher-order programming goodness, use sortBy, rather than sortWith
val marksSorted = marksGrouped.toList.sortBy(_.style_mark)
I would like a List, Seq, or even an Iterable that is a read-only view of a part of a List, in my specific case, the view will always start at the first element.
List.slice, is O(n) as is filter. Is there anyway of doing better than this - I don't need any operations like +, - etc. Just apply, map, flatMap, etc to provide for list comprehension syntax on the sub list.
Is the answer to write my own class whose iterators keep a count to know where the end is?
How about Stream? Stream is Scala's way to laziness. Because of Stream's laziness, Stream.take(), which is what you need in this case, is O(1). The only caveat is that if you want to get back a List after doing a list comprehension on a Stream, you need to convert it back to a List. List.projection gets you a Stream which has most of the opeations of a List.
scala> val l = List(1, 2, 3, 4, 5)
l: List[Int] = List(1, 2, 3, 4, 5)
scala> val s = l.projection.take(3)
s: Stream[Int] = Stream(1, ?)
scala> s.map(_ * 2).toList
res0: List[Int] = List(2, 4, 6)
scala> (for (i <- s) yield i * 2).toList
res1: List[Int] = List(2, 4, 6)
List.slice and List.filter both return Lists -- which are by definition immutable.The + and - methods return a different List, they do not change the original List. Also, it is hard to do better than O(N). A List is not random access, it is a linked list. So imagine if the sublist that you want is the last element of the List. The only way to access that element is to iterate over the entire List.
Well, you can't get better than O(n) for drop on a List. As for the rest:
def constantSlice[T](l: List[T], start: Int, end: Int): Iterator[T] =
l.drop(start).elements.take(end - start)
Both elements and take (on Iterator) are O(1).
Of course, an Iterator is not an Iterable, as it is not reusable. On Scala 2.8 a preference is given to returning Iterable instead of Iterator. If you need reuse on Scala 2.7, then Stream is probably the way to go.