Mutable, indexed sequence for a large amount of integers in Scala? - list

So, if I want to store 5 zeros in a sequence, and access them later through their index numbers, what type of a sequence should I use in Scala?
In Python i would do something like this:
listTest = list(0,0,0,0,0)
listTest[1] = 3
print(listTest)
-> 0,3,0,0,0
I realize a similar question is likely already answered to. Might be that I just don't know the right keywords to find one.

Collections performance characteristics lists among others the following sequences that are mutable, and indexable in constant time
ArrayBuffer
ArraySeq
Array
Note how the document refers to indexing as applying. The reason for this is that in Scala element is accessed via apply method like so
val arr = ArrayBuffer(11, 42, -1)
arr.apply(1) // 42
arr(1) // sugared version of arr.apply(1) so also evaluates to 42
To decide which one to use consider
What is the difference between ArrayBuffer and Array
Array vs ArraySeq
As a side-note, Python's list is conceptually different from Scala's List because former is array-based indexed collection with constant-time indexing, whilst latter is linked list collection with linear-time indexing.

Related

What’s the point of mixed-type lists

Some languages like Python allow you to define mixed type lists:
mixed = [1, 'a', 2, 'b']
While other languages would require that all elements of a list be of the same type.
numbers = [1, 2]
letters = ['a', 'b']
My question is: what’s a valid use case of mixed type lists? I’ve only seen them used in examples.
It seems to me that mixing types in a list is not a good practice and if you have need for it, you’d probably be better off refactoring your code.
What am I missing?
An example of a heterogenous list that you have probably seen yourself thousands of times without realizing it, is the argument list of a subroutine call or message send: for example, in
add_element_to_list(list, element)
The argument list list :: element :: Nil is a heterogeneous list of type List[T] :: T :: Nil.
You could also represent an argument list as a tuple, i.e. in the above example, the argument list would be the 2-tuple (aka "pair") (list, element) of type List[T] × T, but tuples have some limitations. For example, tuples are generally thought of as a single unit, not a collection. But, there are legitimate reasons you might want to iterate over the arguments in an arguments list.
Some languages / collections libraries do allow you to iterate over a tuple, for example, in Python, tuples are considered sequence types, and thus are iterable. Scala also has the productIterator method that lets you iterate over any product type (not just tuples).
But now we're back to square one: if we want to iterate over a tuple, then we need an iterator where the elements it yields can have different types. And an iterator is essentially the same thing as a (lazy) list, so now we have essentially re-introduced heterogeneous lists!
However, there is a probably more well-known use of heterogeneous lists, or more general, heterogeneous collections: database access.
The paper Strongly Typed Heterogenous Collections by Oleg Kiselyov, Ralf Lämmel, and Keean Schupke contains not only an implementation of heterogenous lists in Haskell, but also a motivating example of when, why and how you would use HLists. In particular, they are using it for type-safe compile-time checked database access. (Think LINQ, in fact, the paper they are referencing is the Haskell paper by Erik Meijer et al that led to LINQ.)
Quoting from the introductory paragraph of the HLists paper:
Here is an open-ended list of typical examples that call for heterogeneous collections:
A symbol table that is supposed to store entries of different types is heterogeneous. It is a finite map, where the result type depends on the argument value.
An XML element is heterogeneously typed. In fact, XML elements are nested collections that are constrained by regular expressions and the 1-ambiguity property.
Each row returned by an SQL query is a heterogeneous map from column names to cells. The result of a query is a homogeneous stream of heterogeneous rows.
Adding an advanced object system to a functional language requires heterogeneous collections of a kind that combine extensible records with subtyping and an enumeration interface.
Note that the Python example you gave in your question is really not a heterogenous list in the sense that the word is commonly used, since Python doesn't have (static) types. It is a weakly typed or untyped list. In fact, in some sense, it is actually a homogenous list, since all elements are of the same type: depending on your interpretation, all elements have no type, all elements have the type Object, or all elements have the type Dynamic – but in all cases, they all have the same type. You are then forced to perform casts or unchecked isinstance tests or something like that, in order to actually be able to meaningfully work with the elements, which makes them weakly typed.

Why does the shuffle' function require an Int parameter?

In System.Random.Shuffle,
shuffle' :: RandomGen gen => [a] -> Int -> gen -> [a]
The hackage page mentions this Int argument as
..., its length,...
However, it seems that a simple wrapper function like
shuffle'' x = shuffle' x (length x)
should've sufficed.
shuffle operates by building a tree form of its input list, including the tree size. The buildTree function performs this task using Data.Function.fix in a manner I haven't quite wrapped my head around. Somehow (I think due to the recursion of inner, not the fix magic), it produces a balanced tree, which then has logarithmic lookup. Then it consumes this tree, rebuilding it for every extracted item. The advantage of the data structure would be that it only holds remaining items in an immutable form; lazy updates work for it. But the size of the tree is required data during the indexing, so there's no need to pass it separately to generate the indices used to build the permutation. System.Random.Shuffle.shuffle indeed has no random element - it is only a permutation function. shuffle' exists to feed it a random sequence, using its internal helper rseq. So the reason shuffle' takes a length argument appears to be because they didn't want it to touch the list argument at all; it's only passed into shuffle.
The task doesn't seem terribly suitable for singly linked lists in the first place. I'd probably consider using VectorShuffling instead. And I'm baffled as to why rseq isn't among the exported functions, being the one that uses a random number generator to build a permutation... which in turn might have been better handled using Data.Permute. Probably the reasons have to with history, such as Data.Permute being written later and System.Random.Shuffle being based on a paper on immutable random access queues.
Data.Random.Extras seems to have a more straight forward Seq-based shuffle function.
It might be a case when length of the given list is already known, and doesn't need to be calculated again. Thus, it might be considered as an optimisation.
Besides, in general, the resulting list doesn't need to have the same size as the original one. Thus, this argument could be used for setting this length.
This is true for the original idea of Oleg (source - http://okmij.org/ftp/Haskell/perfect-shuffle.txt):
-- examples
t1 = shuffle1 ['a','b','c','d','e'] [0,0,0,0]
-- "abcde"
-- Note, that rseq of all zeros leaves the sequence unperturbed.
t2 = shuffle1 ['a','b','c','d','e'] [4,3,2,1]
-- "edcba"
-- The rseq of (n-i | i<-[1..n-1]) reverses the original sequence of elements
However, it's not the same for the 'random-shuffle' package implementation:
> shuffle [0..10] [0,0,0,0]
[0,1,2,3random-shuffle.hs: [shuffle] called with lists of different lengths
I think it worth to follow-up with the packages maintainers in order to understand the contract of this function.

Elixir: Select mupliple elements from a list based on index

Assuming I have a list, is there a built-in operator or function to select elements based on a list of indices?
For example, an operator something like this ["a", "b", "z"] = alphabet[0, 1, 25]
An naive implementation of this could be:
def select(list, indices) do
Enum.map(indices, &(Enum.at(list, &1)))
end
If it doesn't exist, it this a deliberate omission to avoid lists being treated like arrays?
An example of what I'm attempting that made me want this, in case I'm asking the wrong question: Given a list, I want to select the first, middle, and last elements, then calculate the median of the three. I was doing length(list) to calculate the length, then I wanted to use this operator/function to select the three elements I'm interested in.
As far as I know, the built in operator does not exist. And each time I have to fetch several elements in a list, I use the same implementation as yours. It is quite short and simple to recreate and I suspect it is the reason why there are no off-the shelf solution in elixir.
Another reason I can think of, is as you pointed out, the fact that lists aren't arrays: when you want to access one element, you have to access all the elements before it, therefore accessing elements by a list of index is not a relevant function, because list are not optimized to be used that way.
Still I often access a list of element with a list of index, meaning that I might not be using elixir the right way.

c++ last element of a structure field

I get a structure, and I don't know the size of it (every time it's different). I would like to set the last place in one of the fields of this structure to a certain value. In pseudocode, I mean something like this:
structureA.fieldB[end] = cert_value;
I'd do it in matlab however I cannot somehow find the proper syntax in c++, can you help me?
In Matlab, a structure data type holds key-value pairs where the "value" may be of different types. In C++, there are some key-value containers available (associative containers like set, map, multimap), but they usually store elements of a single type. What you need if I understood it right is something like
"one" : 1
"two" : [1,2,5]
"three" : "name"
Which means that your structure resembles a Python dictionary.
In C++, the only way I have heard of using containers with truly different types is by using boost::any, which is accepted as the answer to this question.
If you pack a container with elements of different types, then you can use the end() member function of a container to get the last element.
You need sizeof, this gives you the size of the array in bytes. Since you want the the index of the last element, you have to divide this number by the number of bytes for one element. You end up with:
int index_end = sizeof(structureA.fieldB) / sizeof(structureA.fieldB[0]);
structureA.fieldB[index_end] = new_value;

List design in functional languages

I've noticed that in functional languages such as Haskell and OCaml you can do 2 actions with lists. First you can do x:xs where x is an element ans xs is a list and the resulting action is we get a new list where x is appended to the beginning of xs in constant time. Second is x++y where both x and y are lists and the resulting action is we get a new list where y gets appended to the end of x in linear time with respect to the number of elements in x.
Now I'm no expert in how languages are designed and compilers are built, but this seems to me a lot like a simple implementation of a linked list with one pointer to the first item. If I were to implement this data structure in a language like C++ I would find it to be generally trivial to add a pointer to the last element. In this case if these languages were implemented this way (assuming they do use linked lists as described) adding a "pointer" to the last item would make it much more efficient to add items to the end of a list and would allow pattern matching with the last element.
My question is are these data structures really implemented as linked lists, and if so why do they not add a reference to the last element?
Yes, they really are linked lists. But they are immutable. The advantage of immutability is that you don't have to worry about who else has a pointer to the same list. You might choose to write x++y, but somewhere else in the program might be relying on x remaining unchanged.
People who work on compilers for such languages (of whom I am one) don't worry about this cost because there are plenty of other data structures that provide efficient access:
A functional queue represented as two lists provides constant-time access to both ends and amortized constant time for put and get operations.
A more sophisticated data structure like a finger tree can provide several kinds of list access at very low cost.
If you just want constant-time append, John Hughes developed an excellent, simple representation of lists as functions, which provides exactly that. (In the Haskell library they are called DList.)
If you're interested in these sorts of questions you can get good info from Chris Okasaki's book Purely Functional Data Structures and from some of Ralf Hinze's less intimidating papers.
You said:
Second is x++y where both x and y are
lists and the resulting action is y
gets appended to the end of x in
linear time with respect to the number
of elements in x.
This is not really true in a functional language like Haskell; y gets appended to a copy of x, since anything holding onto x is depending on it not changing.
If you're going to copy all of x anyway, holding onto its last node doesn't really gain you anything.
Yes, they are linked lists. In languages like Haskell and OCaml, you don't add items to the end of a list, period. Lists are immutable. There is one operation to create new lists — cons, the : operator you refer to earlier. It takes an element and a list, and creates a new list with the element as the head and the list as the tail. The reason x++y takes linear time is because it must cons the last element of x with y, and then cons the second-to-last element of x with that list, and so on with each element of x. None of the cons cells in x can be reused, because that would cause the original list to change as well. A pointer to the last element of x would not be very helpful here — we still have to walk the whole list.
++ is just one of dozens of "things you can do with lists". The reality is that lists are so versatile that one rarely uses other collections. Also, we functional programmers almost never feel the need to look at the last element of a list - if we need to, there is a function last.
However, just because lists are convenient this does not mean that we do not have other data structures. If you're really interested, have a look at this book http://www.cs.cmu.edu/~rwh/theses/okasaki.pdf (Purely Functional Data Structures). You'll find trees, queues, lists with O(1) append of an element at the tail, and so forth.
Here's a bit of an explanation on how things are done in Clojure:
The easiest way to avoid mutating state is to use immutable data structures. Clojure provides a set of immutable lists, vectors, sets and maps. Since they can't be changed, 'adding' or 'removing' something from an immutable collection means creating a new collection just like the old one but with the needed change. Persistence is a term used to describe the property wherein the old version of the collection is still available after the 'change', and that the collection maintains its performance guarantees for most operations. Specifically, this means that the new version can't be created using a full copy, since that would require linear time. Inevitably, persistent collections are implemented using linked data structures, so that the new versions can share structure with the prior version. Singly-linked lists and trees are the basic functional data structures, to which Clojure adds a hash map, set and vector both based upon array mapped hash tries.
(emphasis mine)
So basically it looks you're mostly correct, at least as far as Clojure is concerned.