Why are arbitrary sized tuples useful? (Template Haskell) - list

In the introductory text for Template Haskell one of the examples for why Template Haskell is useful is working with arbitrary sized tuples.
What is the purpose of arbitrary sized tuples? If the data type is the same why not use a list? And if the data types in the tuple are different, how can it be expanded to arbitrary sizes?

With arbitrary one means arbitrary at compile time. So if you want tuples with fifteen elements, template Haskell will generate a function for a tuple with fifteen elements. After compilation however, the number of elements is fixed. The advantage of using a tuple is that you can access each element in constant time O(1). So you can use the type system to enforce the tuple still contains a fixed amount of elements.
Furthermore the sel in the example can work with tuples where the elements have arbitrary types. For instance sel 2 3 will generate a function:
$(sel 2 3) :: (a,b,c) -> b
$(sel 5 5) :: (a,b,c,d,e) -> e
whereas if you use a list [a], this means that the list can only contain data for a certain type:
(!!3) :: [a] -> a
so all items have type a. Furthermore in this case you are not sure that the list will have three elements. The more you can check at compile time, the more safe your code is (and in many cases more efficient as well).
A list on the other hand has an arbitrary size at runtime. The same type - for instance [Int] - can specify a list with two, five, hundred or thousands of integers. Furthermore in the case of a list, accessing the k-th element requires O(k) time. There are datastructures like arrays that of course can access elements in constant time.

Related

What’s the point of mixed-type lists

Some languages like Python allow you to define mixed type lists:
mixed = [1, 'a', 2, 'b']
While other languages would require that all elements of a list be of the same type.
numbers = [1, 2]
letters = ['a', 'b']
My question is: what’s a valid use case of mixed type lists? I’ve only seen them used in examples.
It seems to me that mixing types in a list is not a good practice and if you have need for it, you’d probably be better off refactoring your code.
What am I missing?
An example of a heterogenous list that you have probably seen yourself thousands of times without realizing it, is the argument list of a subroutine call or message send: for example, in
add_element_to_list(list, element)
The argument list list :: element :: Nil is a heterogeneous list of type List[T] :: T :: Nil.
You could also represent an argument list as a tuple, i.e. in the above example, the argument list would be the 2-tuple (aka "pair") (list, element) of type List[T] × T, but tuples have some limitations. For example, tuples are generally thought of as a single unit, not a collection. But, there are legitimate reasons you might want to iterate over the arguments in an arguments list.
Some languages / collections libraries do allow you to iterate over a tuple, for example, in Python, tuples are considered sequence types, and thus are iterable. Scala also has the productIterator method that lets you iterate over any product type (not just tuples).
But now we're back to square one: if we want to iterate over a tuple, then we need an iterator where the elements it yields can have different types. And an iterator is essentially the same thing as a (lazy) list, so now we have essentially re-introduced heterogeneous lists!
However, there is a probably more well-known use of heterogeneous lists, or more general, heterogeneous collections: database access.
The paper Strongly Typed Heterogenous Collections by Oleg Kiselyov, Ralf Lämmel, and Keean Schupke contains not only an implementation of heterogenous lists in Haskell, but also a motivating example of when, why and how you would use HLists. In particular, they are using it for type-safe compile-time checked database access. (Think LINQ, in fact, the paper they are referencing is the Haskell paper by Erik Meijer et al that led to LINQ.)
Quoting from the introductory paragraph of the HLists paper:
Here is an open-ended list of typical examples that call for heterogeneous collections:
A symbol table that is supposed to store entries of different types is heterogeneous. It is a finite map, where the result type depends on the argument value.
An XML element is heterogeneously typed. In fact, XML elements are nested collections that are constrained by regular expressions and the 1-ambiguity property.
Each row returned by an SQL query is a heterogeneous map from column names to cells. The result of a query is a homogeneous stream of heterogeneous rows.
Adding an advanced object system to a functional language requires heterogeneous collections of a kind that combine extensible records with subtyping and an enumeration interface.
Note that the Python example you gave in your question is really not a heterogenous list in the sense that the word is commonly used, since Python doesn't have (static) types. It is a weakly typed or untyped list. In fact, in some sense, it is actually a homogenous list, since all elements are of the same type: depending on your interpretation, all elements have no type, all elements have the type Object, or all elements have the type Dynamic – but in all cases, they all have the same type. You are then forced to perform casts or unchecked isinstance tests or something like that, in order to actually be able to meaningfully work with the elements, which makes them weakly typed.

How can I get multiple nth elements in a list?

I want to know how I can get multiple elements in a list in one function
for example if I wanted to get elements 1 2 3 and 4 from list a, I would have to type a!!0 ++ a!!1 ++ a!!2 ++ a!!3. This takes up a lot of space especially if the list name is more than one character. so I'm wondering if I can do something like a!![0,1,2,3] instead and get all of those elements in a much shorter way. Thank you.
You can work with a mapping where you lookup all items for a list, so:
map (a !!) [0,1,2,3]
If you are however interested in the first four items, you can work with take :: Int -> [a] -> [a]:
take 4 a
especially since looking up by an index (with (!!)) is not a common operation in Haskell: this is unsafe since it is not guaranteed that the index is in bounds. Most list processing is done with functions like take, drop, sum, foldr, etc.

Numba Creating a tuple from a list

I have a very simple problem I can't solve.
I'm using Numba and Cuda.
I have a list T=[1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0] and I want a tuple with the elements of the list, like this:
C=(1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0).
In Python I would code C=tuple(T), but I can't with Numba.
I tried these solutions but none of them work because you can't modify the type of a variable inside a loop with Numba.
Important My list has a length which is a multiple of 3 and I use this knowledge for my algorithms.
Code
First algorithm, recursively, it works by giving the algorithm L=[(1.0,),(2.0,),(3.0,),(4.0,),(5.0,),...,(9.0,)]
#njit
def list2tuple(L):
n=len(L)
if n == 1:
return L[0]
else:
C=[]
for i in range(0,n-2,3):
c=L[i]+L[i+1]+L[i+2]
C.append(c)
return list2tuple(C)
Problem: It enters an infinite loop and I must stop the kernel. It works in basic Python.
algorithm 2: it works by giving T=[1.0,2.0,3.0,...,9.0]
#njit
def list2tuple2(T):
L=[]
for i in range(len(T)):
a=(T[i],)
L.append(a)
for k in range(len(T)//3-1):
n=len(L)
C=[]
for j in range(0,n-2,3):
c=L[j]+L[j+1]+L[j+2]
C.append(c)
L=C
return L[0]
Problem When C=[(1.0,2.0,3.0),(4.0,5.0,6.0),(7.0,8.0,9.0)], you can't say L=C because L = [(1.0,),(2.0,),(3.0,),....(9.0,)] is List(Unituple(float64x1)) and can't be unified with List(Unituple(float64x3)).
I can't find a solution for this problem.
Tuples and list are very different in Numba: a list is an unbounded sequence of items of the same type while a tuple is a fixed-length sequence of items of possibly different type. As a result, the type of a list of 2 element can be defined as List[ItemType] while a tuple of 2 items can be Tuple[ItemType1, ItemType2] (where ItemType1 and ItemType2 may be the same). A list of 3 items still have the same type (List[ItemType]). However, a tuple of 3 element is a completely different type: Tuple[ItemType1, ItemType2, ItemType3]. Numba defines a UniTuple type to easily create a N-ary tuple where each item is of the same type, but this is only for convenience. Internally, Numba (and more specifically the JIT compiler: LLVM-Lite) needs to iterates over all the types and generate specific functions for each tuple type.
As a result, creating a recursive function that works on growing tuples is not possible because Numba cannot generate all the possible tuple type so to be able to just compile all the functions (one by tuple type). Indeed, the maximum length of the tuple is only known at runtime.
More generally, you cannot generate a N-ary tuple where N is variable in a Numba function. However, you can instead to generate and compile a function for a specific N. If N is very small (e.g. <15), this is fine. Otherwise, it is really a bad idea to write/generate such a function. Indeed, for the JIT, this is equivalent to generate a function with N independent parameters and when N is big, the compilation time can quickly become huge and actually be much slower than the actual computation (many compiler algorithms have theoretically a pretty big complexity, for example the register allocation which is known to be NP-complete although heuristics are relatively fast in most practical cases). Not to mention the required memory to generate such a function can also be huge.

Mutable, indexed sequence for a large amount of integers in Scala?

So, if I want to store 5 zeros in a sequence, and access them later through their index numbers, what type of a sequence should I use in Scala?
In Python i would do something like this:
listTest = list(0,0,0,0,0)
listTest[1] = 3
print(listTest)
-> 0,3,0,0,0
I realize a similar question is likely already answered to. Might be that I just don't know the right keywords to find one.
Collections performance characteristics lists among others the following sequences that are mutable, and indexable in constant time
ArrayBuffer
ArraySeq
Array
Note how the document refers to indexing as applying. The reason for this is that in Scala element is accessed via apply method like so
val arr = ArrayBuffer(11, 42, -1)
arr.apply(1) // 42
arr(1) // sugared version of arr.apply(1) so also evaluates to 42
To decide which one to use consider
What is the difference between ArrayBuffer and Array
Array vs ArraySeq
As a side-note, Python's list is conceptually different from Scala's List because former is array-based indexed collection with constant-time indexing, whilst latter is linked list collection with linear-time indexing.

Why does the shuffle' function require an Int parameter?

In System.Random.Shuffle,
shuffle' :: RandomGen gen => [a] -> Int -> gen -> [a]
The hackage page mentions this Int argument as
..., its length,...
However, it seems that a simple wrapper function like
shuffle'' x = shuffle' x (length x)
should've sufficed.
shuffle operates by building a tree form of its input list, including the tree size. The buildTree function performs this task using Data.Function.fix in a manner I haven't quite wrapped my head around. Somehow (I think due to the recursion of inner, not the fix magic), it produces a balanced tree, which then has logarithmic lookup. Then it consumes this tree, rebuilding it for every extracted item. The advantage of the data structure would be that it only holds remaining items in an immutable form; lazy updates work for it. But the size of the tree is required data during the indexing, so there's no need to pass it separately to generate the indices used to build the permutation. System.Random.Shuffle.shuffle indeed has no random element - it is only a permutation function. shuffle' exists to feed it a random sequence, using its internal helper rseq. So the reason shuffle' takes a length argument appears to be because they didn't want it to touch the list argument at all; it's only passed into shuffle.
The task doesn't seem terribly suitable for singly linked lists in the first place. I'd probably consider using VectorShuffling instead. And I'm baffled as to why rseq isn't among the exported functions, being the one that uses a random number generator to build a permutation... which in turn might have been better handled using Data.Permute. Probably the reasons have to with history, such as Data.Permute being written later and System.Random.Shuffle being based on a paper on immutable random access queues.
Data.Random.Extras seems to have a more straight forward Seq-based shuffle function.
It might be a case when length of the given list is already known, and doesn't need to be calculated again. Thus, it might be considered as an optimisation.
Besides, in general, the resulting list doesn't need to have the same size as the original one. Thus, this argument could be used for setting this length.
This is true for the original idea of Oleg (source - http://okmij.org/ftp/Haskell/perfect-shuffle.txt):
-- examples
t1 = shuffle1 ['a','b','c','d','e'] [0,0,0,0]
-- "abcde"
-- Note, that rseq of all zeros leaves the sequence unperturbed.
t2 = shuffle1 ['a','b','c','d','e'] [4,3,2,1]
-- "edcba"
-- The rseq of (n-i | i<-[1..n-1]) reverses the original sequence of elements
However, it's not the same for the 'random-shuffle' package implementation:
> shuffle [0..10] [0,0,0,0]
[0,1,2,3random-shuffle.hs: [shuffle] called with lists of different lengths
I think it worth to follow-up with the packages maintainers in order to understand the contract of this function.