Why does the shuffle' function require an Int parameter? - list

In System.Random.Shuffle,
shuffle' :: RandomGen gen => [a] -> Int -> gen -> [a]
The hackage page mentions this Int argument as
..., its length,...
However, it seems that a simple wrapper function like
shuffle'' x = shuffle' x (length x)
should've sufficed.

shuffle operates by building a tree form of its input list, including the tree size. The buildTree function performs this task using Data.Function.fix in a manner I haven't quite wrapped my head around. Somehow (I think due to the recursion of inner, not the fix magic), it produces a balanced tree, which then has logarithmic lookup. Then it consumes this tree, rebuilding it for every extracted item. The advantage of the data structure would be that it only holds remaining items in an immutable form; lazy updates work for it. But the size of the tree is required data during the indexing, so there's no need to pass it separately to generate the indices used to build the permutation. System.Random.Shuffle.shuffle indeed has no random element - it is only a permutation function. shuffle' exists to feed it a random sequence, using its internal helper rseq. So the reason shuffle' takes a length argument appears to be because they didn't want it to touch the list argument at all; it's only passed into shuffle.
The task doesn't seem terribly suitable for singly linked lists in the first place. I'd probably consider using VectorShuffling instead. And I'm baffled as to why rseq isn't among the exported functions, being the one that uses a random number generator to build a permutation... which in turn might have been better handled using Data.Permute. Probably the reasons have to with history, such as Data.Permute being written later and System.Random.Shuffle being based on a paper on immutable random access queues.
Data.Random.Extras seems to have a more straight forward Seq-based shuffle function.

It might be a case when length of the given list is already known, and doesn't need to be calculated again. Thus, it might be considered as an optimisation.
Besides, in general, the resulting list doesn't need to have the same size as the original one. Thus, this argument could be used for setting this length.
This is true for the original idea of Oleg (source - http://okmij.org/ftp/Haskell/perfect-shuffle.txt):
-- examples
t1 = shuffle1 ['a','b','c','d','e'] [0,0,0,0]
-- "abcde"
-- Note, that rseq of all zeros leaves the sequence unperturbed.
t2 = shuffle1 ['a','b','c','d','e'] [4,3,2,1]
-- "edcba"
-- The rseq of (n-i | i<-[1..n-1]) reverses the original sequence of elements
However, it's not the same for the 'random-shuffle' package implementation:
> shuffle [0..10] [0,0,0,0]
[0,1,2,3random-shuffle.hs: [shuffle] called with lists of different lengths
I think it worth to follow-up with the packages maintainers in order to understand the contract of this function.

Related

Numba Creating a tuple from a list

I have a very simple problem I can't solve.
I'm using Numba and Cuda.
I have a list T=[1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0] and I want a tuple with the elements of the list, like this:
C=(1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0).
In Python I would code C=tuple(T), but I can't with Numba.
I tried these solutions but none of them work because you can't modify the type of a variable inside a loop with Numba.
Important My list has a length which is a multiple of 3 and I use this knowledge for my algorithms.
Code
First algorithm, recursively, it works by giving the algorithm L=[(1.0,),(2.0,),(3.0,),(4.0,),(5.0,),...,(9.0,)]
#njit
def list2tuple(L):
n=len(L)
if n == 1:
return L[0]
else:
C=[]
for i in range(0,n-2,3):
c=L[i]+L[i+1]+L[i+2]
C.append(c)
return list2tuple(C)
Problem: It enters an infinite loop and I must stop the kernel. It works in basic Python.
algorithm 2: it works by giving T=[1.0,2.0,3.0,...,9.0]
#njit
def list2tuple2(T):
L=[]
for i in range(len(T)):
a=(T[i],)
L.append(a)
for k in range(len(T)//3-1):
n=len(L)
C=[]
for j in range(0,n-2,3):
c=L[j]+L[j+1]+L[j+2]
C.append(c)
L=C
return L[0]
Problem When C=[(1.0,2.0,3.0),(4.0,5.0,6.0),(7.0,8.0,9.0)], you can't say L=C because L = [(1.0,),(2.0,),(3.0,),....(9.0,)] is List(Unituple(float64x1)) and can't be unified with List(Unituple(float64x3)).
I can't find a solution for this problem.
Tuples and list are very different in Numba: a list is an unbounded sequence of items of the same type while a tuple is a fixed-length sequence of items of possibly different type. As a result, the type of a list of 2 element can be defined as List[ItemType] while a tuple of 2 items can be Tuple[ItemType1, ItemType2] (where ItemType1 and ItemType2 may be the same). A list of 3 items still have the same type (List[ItemType]). However, a tuple of 3 element is a completely different type: Tuple[ItemType1, ItemType2, ItemType3]. Numba defines a UniTuple type to easily create a N-ary tuple where each item is of the same type, but this is only for convenience. Internally, Numba (and more specifically the JIT compiler: LLVM-Lite) needs to iterates over all the types and generate specific functions for each tuple type.
As a result, creating a recursive function that works on growing tuples is not possible because Numba cannot generate all the possible tuple type so to be able to just compile all the functions (one by tuple type). Indeed, the maximum length of the tuple is only known at runtime.
More generally, you cannot generate a N-ary tuple where N is variable in a Numba function. However, you can instead to generate and compile a function for a specific N. If N is very small (e.g. <15), this is fine. Otherwise, it is really a bad idea to write/generate such a function. Indeed, for the JIT, this is equivalent to generate a function with N independent parameters and when N is big, the compilation time can quickly become huge and actually be much slower than the actual computation (many compiler algorithms have theoretically a pretty big complexity, for example the register allocation which is known to be NP-complete although heuristics are relatively fast in most practical cases). Not to mention the required memory to generate such a function can also be huge.

Correct way to add an element to the end of a list?

I was reading on this Haskell page about adding an element to the end of a List.
Using the example, I tried it out for my self. Given the following List I wanted to add the number 56 at the end of it.
Example:
let numbers = [4,8,15,16,23,42]
numbers ++ [56]
I was thrown off by this comment:
Adding an item to the end of a list is a fine exercise, but usually
you shouldn't do it in real Haskell programs. It's expensive, and
indicates you are building your list in the wrong order. There is
usually a better approach.
Researching, I realize that what I'm actually doing is creating a List with 56 as the only element and I'm combining it with the numbers list. Is that correct?
Is using ++ the correct way to add an element to the end of a List?
++ [x] is the correct way to add an element to the end of a list, but what the comment is saying is that you shouldn't add elements to the end of a list.
Due to the way lists are defined, adding an element at the end always requires making a copy of the list. That is,
xs ++ ys
needs to copy all of xs but can reuse ys unchanged.
If xs is just one element (i.e. we're adding to the beginning of a list), that's no problem: Copying one element takes practically no time at all.
But if xs is longer, we need to spend more time in ++.
And if we're doing this repeatedly (i.e. we're building up a big list by continually adding elements to the end), then we need to spend a lot of time making redundant copies. (Building an n-element list in this way is an O(n2) operation.)
If you need to do this, there is usually a better way to structure your algorithm. For example, you can build your list in reverse order (adding elements at the beginning) and only call reverse at the end.
It's the correct way in that all ways of doing it must reduce to at least that much work. The problem is wanting to append to the end of a list at all. That's not an operation that's possible to do efficiently with immutable linked lists.
The better approach is figuring out how to solve your specific problem without doing that. There are a lot of potential approaches. Picking the right one depends on the details of what you're doing. Maybe you can get away with just using laziness correctly. Maybe you are best off generating the list backwards and then reversing it once at the end. Maybe you're best off using a different data structure. It all depends on your specific use case.

What container really mimics std::vector in Haskell?

The problem
I'm looking for a container that is used to save partial results of n - 1 problems in order to calculate the nth one. This means that the size of the container, at the end, will always be n.
Each element, i, of the container depends on at least 2 and up to 4 previous results.
The container have to provide:
constant time insertions at either beginning or end (one of the two, not necessarily both)
constant time indexing in the middle
or alternatively (given a O(n) initialization):
constant time single element edits
constant time indexing in the middle
What is std::vector and why is it relevant
For those of you who don't know C++, std::vector is a dynamically sized array. It is a perfect fit for this problem because it is able to:
reserve space at construction
offer constant time indexing in the middle
offer constant time insertion at the end (with a reserved space)
Therefore this problem is solvable in O(n) complexity, in C++.
Why Data.Vector is not std::vector
Data.Vector, together with Data.Array, provide similar functionality to std::vector, but not quite the same. Both, of course, offer constant time indexing in the middle, but they offer neither constant time modification ((//) for example is at least O(n)) nor constant time insertion at either beginning of end.
Conclusion
What container really mimics std::vector in Haskell? Alternatively, what is my best shot?
From reddit comes the suggestion to use Data.Vector.constructN:
O(n) Construct a vector with n elements by repeatedly applying the generator function to the already constructed part of the vector.
constructN 3 f = let a = f <> ; b = f <a> ; c = f <a,b> in f <a,b,c>
For example:
λ import qualified Data.Vector as V
λ V.constructN 10 V.length
fromList [0,1,2,3,4,5,6,7,8,9]
λ V.constructN 10 $ (1+) . V.sum
fromList [1,2,4,8,16,32,64,128,256,512]
λ V.constructN 10 $ \v -> let n = V.length v in if n <= 1 then 1 else (v V.! (n - 1)) + (v V.! (n - 2))
fromList [1,1,2,3,5,8,13,21,34,55]
This certainly seems to qualify to solve the problem as you've described it above.
The first data structures that come to my mind are either Maps from Data.Map or Sequences from Data.Sequence.
Update
Data.Sequence
Sequences are persistent data structures that allow most operations efficient, while allowing only finite sequences. Their implementation is based on finger-trees, if you are interested. But which qualities does it have?
O(1) calculation of the length
O(1) insert at front/back with the operators <| and |> respectively.
O(n) creation from a list with fromlist
O(log(min(n1,n2))) concatenation for sequences of length n1 and n2.
O(log(min(i,n-i))) indexing for an element at position i in a sequence of length n.
Furthermore this structure supports a lot of the known and handy functions you'd expect from a list-like structure: replicate, zip, null, scans, sort, take, drop, splitAt and many more. Due to these similarities you have to do either qualified import or hide the functions in Prelude, that have the same name.
Data.Map
Maps are the standard workhorse for realizing a correspondence between "things", what you might call a Hashmap or associave array in other programming languages are called Maps in Haskell; other than in say Python Maps are pure - so an update gives you back a new Map and does not modify the original instance.
Maps come in two flavors - strict and lazy.
Quoting from the Documentation
Strict
API of this module is strict in both the keys and the values.
Lazy
API of this module is strict in the keys, but lazy in the values.
So you need to choose what fits best for your application. You can try both versions and benchmark with criterion.
Instead of listing the features of Data.Map I want to pass on to
Data.IntMap.Strict
Which can leverage the fact that the keys are integers to squeeze out a better performance
Quoting from the documentation we first note:
Many operations have a worst-case complexity of O(min(n,W)). This means that the operation can become linear in the number of elements with a maximum of W -- the number of bits in an Int (32 or 64).
So what are the characteristics for IntMaps
O(min(n,W)) for (unsafe) indexing (!), unsafe in the sense that you will get an error if the key/index does not exist. This is the same behavior as Data.Sequence.
O(n) calculation of size
O(min(n,W)) for safe indexing lookup, which returns a Nothing if the key is not found and Just a otherwise.
O(min(n,W)) for insert, delete, adjust and update
So you see that this structure is less efficient than Sequences, but provide a bit more safety and a big benefit if you actually don't need all entries, such the representation of a sparse graph, where the nodes are integers.
For completeness I'd like to mention a package called persistent-vector, which implements clojure-style vectors, but seems to be abandoned as the last upload is from (2012).
Conclusion
So for your use case I'd strongly recommend Data.Sequence or Data.Vector, unfortunately I don't have any experience with the latter, so you need to try it for yourself. From the stuff I know it provides a powerful thing called stream fusion, that optimizes to execute multiple functions in one tight "loop" instead of running a loop for each function. A tutorial for Vector can be found here.
When looking for functional containers with particular asymptotic run times, I always pull out Edison.
Note that there's a result that in a strict language with immutable data structures, there's always a logarithmic slowdown to implementing mutable data structure on top of them. It's an open problem whether the limited mutation hidden behind laziness can avoid that slowdown. There also the issue of persistent vs. transient...
Okasaki is still a good read for background, but finger trees or something more complex like an RRB-tree should be available "off-the-shelf" and solve your problem.
I'm looking for a container that is used to save partial results of n - 1 problems in order to calculate the nth one.
Each element, i, of the container depends on at least 2 and up to 4 previous results.
Lets consider a very small program. that calculates fibonacci numbers.
fib 1 = 1
fib 2 = 1
fib n = fib (n-1) + fib (n-2)
This is great for small N, but horrible for n > 10. At this point, you stumble across this gem:
fib n = fibs !! n where fibs = 1 : 1 : zipWith (+) fibs (tail fibs)
You may be tempted to exclaim that this is dark magic (infinite, self referential list building and zipping? wth!) but it is really a great example of tying the knot, and using lazyness to ensure that values are calcuated as-needed.
Similarly, we can use an array to tie the knot too.
import Data.Array
fib n = arr ! 10
where arr :: Arr Int Int
arr = listArray (1,n) (map fib' [1..n])
fib' 1 = 1
fib' 2 = 1
fib' n = arr!(n-1) + arr!(n-2)
Each element of the array is a thunk that uses other elements of the array to calculate it's value. In this way, we can build a single array, never having to perform concatenation, and call out values from the array at will, only paying for the calculation up to that point.
The beauty of this method is that you don't only have to look behind you, you can look in front of you as well.

About lists:suffix/2 in Erlang

The source code:
suffix(Suffix, List) ->
Delta = length(List) - length(Suffix),
Delta >= 0 andalso nthtail(Delta, List) =:= Suffix.
How about rewriting it like the follow:
suffix(Suffix, List) ->
prefix(reverse(Suffix), reverse(List)).
If Delta >=0, the first one will traverse four times, and the second one will traverse three times, is it correct?
The first one (from stdlib lists.erl) will traverse both lists twice each, yes. On the other hand, on the second traversal all the list cells will probably be in L2 cache, and it doesn't have to allocate any data. Your suggestion works too, but has to build two reversed temporary lists on the heap, which both has a cost in allocating and initializing data structures as well as causing garbage collection to happen more often on average.
If you think about the same problem in C (or any similar language): testing whether one singly linked list is a suffix of another singly linked list, it becomes more obvious why it's hard to do efficiently, in particular if you want to avoid allocating memory, and you aren't allowed to use tricks like reversing pointers.
I don't think it is correct. As far as I know, length is a build in function which does not need to traverse the list to get the result (it is the reason why it is allowed in guard test), and the andalso is a kind of shortcut. if the first term is false, it does not evaluate the second term and directly return false.

Finding a path between two lists

I've been tasked with the portion of the code that reads through two lists and finds a path between the two lists.
My problem is however with reading and returning a TRUE case if I find the required node in the first list. I've been writing my code as below.
%declare the list
circle_line([cc1,cc2,cc3,cc4,cc5,cc6,cc7,cc8,cc9,cc10,cc11,cc12,cc13,cc14,cc15,cc16]).
%the predicate that finds the station i need
check_for_station(A) :- circle_line(X),
member(A,X),
write(A),nl.
Then in the cosole I type: check_for_station(cc9).
But the answer I get is "no."
I have a feeling that I'm declaring the list wrong, because in the debugger the value of X comes out to be "H135" or something and it clearly does not loop through every element to find the one that I need.
As whythehack found, in Amzi! Prolog the classic (and nondeterministic) member/2 predicate is not a "built-in" but is provided (along with other useful list manipulation predicates like length/2) in the list library. See here for some context about the built-in versus library predicates for lists.
If check_for_station/1 is only to be called with its argument A bound (as the example suggests), then Amzi!'s built-in (and deterministic) predicate is_member/2 will do the job (and a bit more quickly).
The thinking seems to be that nondeterministic member/2 is a two-line user-definition, and providing a faster deterministic version (not permitting backtracking over members of a list) is something the user could not easily provide for themselves.