I'm trying to perform a manipulation of a nested data structure containing lists of elements. After mucking around with various approaches I finally settled on lenses as the best way to go about doing this. They work perfectly for finding and modifying specific elements of the structure, but so far I'm stumped on how to add new elements.
From what I've read, I can't technically use a Traversal as it violates the Traversal laws to insert a new element into a list, and that's assuming I could even figure out how to do that using a Traversal in the first place (I'm still fairly weak with Haskell, and the type signatures for most things in the lens package make my head spin).
Specifically what I'm trying to accomplish is, find some element in the list of elements that matches a specific selector, and then insert a new element either before, or after the matched element (different argument to the function for either before or after the match). Does Control.Lens already have something that can accomplish what I'm trying to do and my understanding of the type signatures is just too weak to see it? Is there a better way to accomplish what I'm trying to do?
It would be fairly trivial if I was just trying to add a new element either to the beginning or the end of a list, but inserting it somewhere specific in the middle is the difficult part. In some of the pre-lens code I wrote I used a fold to accomplish what I wanted, but it was starting to get gnarly on the more deeply nested parts of the structure (E.G. a fold inside of a fold inside of a fold) so I turned to Control.Lens to try to untangle some of that mess.
Using lens pacakge
If we start with knowing the function id can be used like a lens:
import Control.Lens
> [1,2,3,4] ^. id
[1,2,3,4]
Then we can move on to how the list can be modified:
> [1,2,3,4] & id %~ (99:)
[99,1,2,3,4]
The above allows for insertion at the start of the list. To focus on the latter parts of the list we can use _tail from the Control.Lens.Cons module.
> [1,2,3,4] ^. _tail
[2,3,4]
> [1,2,3,4] & _tail %~ (99:)
[1,99,2,3,4]
Now to generalize this for the nth position
> :{
let
_drop 0 = id
_drop n = _tail . _drop (n - 1)
:}
> [1,2,3,4] ^. _drop 1
[2,3,4]
> [1,2,3,4] & _drop 0 %~ (99:)
[99,1,2,3,4]
> [1,2,3,4] & _drop 1 %~ (99:)
[1,99,2,3,4]
One last step to generalize this over all types with a Cons instance we can use cons or <|.
> [1,2,3,4] & _drop 1 %~ (99<|)
[1,99,2,3,4]
> import Data.Text
> :set -XOverloadedStrings
> ("h there"::Text) & _drop 1 %~ ('i'<|)
"hi there"
I think a simple approach would be break down the problem in:
A function that is of [a] -> SomeAddtionalData -> [a], which is basically responsible to transform the list into another list using some specific data. This is where you add/remove elements from the list and get a new list
Use lense to extract the List from some nested data structure, pass that list to above defined function, set the returned list in the nested data structure using lense.
Your last paragraph is the indication about what happens when you try to do too much using a generic abstraction like Lens. These generic abstractions are good for some generic purpose and everything else is specific to your problem and should be designed around plain old functions (at least initially, later on in your project you may find some general pattern across your code base which can be abstracted using type classes etc.).
Some comments on your problem:
Answer the Question:
There may be a way to do what you want to do. The Lens library is amazingly generic. What there is not is a simple or obvious way to make it happen. I think the it will involve the partsOf combinator but I'm not sure.
Comments on Lenses:
The lens library is really cool and can apply to an amazing number of problems. My initial temptation as I am learning the library was to try to fit everything into a Lens access or mutation. What I discovered was that it was better to use the lens library to dig into my complex data structures, but once I had a simple element it was better to use the more traditional functional techniques I already knew rather then stretching the Lens library past it's useful limit.
Advice you didn't ask for:
Inserting an element into the middle of a list is a bad idea. Not that it cannot be done but it can end up being an O(n^2) operation. (See also this StackOverflow answer.)Zip lists or some other functional data structure may be a better idea. As a side benefit, some of these structures could be made instance of the At class allowing for insertion and deletion using the partial lens combinators.
Related
I was reading on this Haskell page about adding an element to the end of a List.
Using the example, I tried it out for my self. Given the following List I wanted to add the number 56 at the end of it.
Example:
let numbers = [4,8,15,16,23,42]
numbers ++ [56]
I was thrown off by this comment:
Adding an item to the end of a list is a fine exercise, but usually
you shouldn't do it in real Haskell programs. It's expensive, and
indicates you are building your list in the wrong order. There is
usually a better approach.
Researching, I realize that what I'm actually doing is creating a List with 56 as the only element and I'm combining it with the numbers list. Is that correct?
Is using ++ the correct way to add an element to the end of a List?
++ [x] is the correct way to add an element to the end of a list, but what the comment is saying is that you shouldn't add elements to the end of a list.
Due to the way lists are defined, adding an element at the end always requires making a copy of the list. That is,
xs ++ ys
needs to copy all of xs but can reuse ys unchanged.
If xs is just one element (i.e. we're adding to the beginning of a list), that's no problem: Copying one element takes practically no time at all.
But if xs is longer, we need to spend more time in ++.
And if we're doing this repeatedly (i.e. we're building up a big list by continually adding elements to the end), then we need to spend a lot of time making redundant copies. (Building an n-element list in this way is an O(n2) operation.)
If you need to do this, there is usually a better way to structure your algorithm. For example, you can build your list in reverse order (adding elements at the beginning) and only call reverse at the end.
It's the correct way in that all ways of doing it must reduce to at least that much work. The problem is wanting to append to the end of a list at all. That's not an operation that's possible to do efficiently with immutable linked lists.
The better approach is figuring out how to solve your specific problem without doing that. There are a lot of potential approaches. Picking the right one depends on the details of what you're doing. Maybe you can get away with just using laziness correctly. Maybe you are best off generating the list backwards and then reversing it once at the end. Maybe you're best off using a different data structure. It all depends on your specific use case.
In System.Random.Shuffle,
shuffle' :: RandomGen gen => [a] -> Int -> gen -> [a]
The hackage page mentions this Int argument as
..., its length,...
However, it seems that a simple wrapper function like
shuffle'' x = shuffle' x (length x)
should've sufficed.
shuffle operates by building a tree form of its input list, including the tree size. The buildTree function performs this task using Data.Function.fix in a manner I haven't quite wrapped my head around. Somehow (I think due to the recursion of inner, not the fix magic), it produces a balanced tree, which then has logarithmic lookup. Then it consumes this tree, rebuilding it for every extracted item. The advantage of the data structure would be that it only holds remaining items in an immutable form; lazy updates work for it. But the size of the tree is required data during the indexing, so there's no need to pass it separately to generate the indices used to build the permutation. System.Random.Shuffle.shuffle indeed has no random element - it is only a permutation function. shuffle' exists to feed it a random sequence, using its internal helper rseq. So the reason shuffle' takes a length argument appears to be because they didn't want it to touch the list argument at all; it's only passed into shuffle.
The task doesn't seem terribly suitable for singly linked lists in the first place. I'd probably consider using VectorShuffling instead. And I'm baffled as to why rseq isn't among the exported functions, being the one that uses a random number generator to build a permutation... which in turn might have been better handled using Data.Permute. Probably the reasons have to with history, such as Data.Permute being written later and System.Random.Shuffle being based on a paper on immutable random access queues.
Data.Random.Extras seems to have a more straight forward Seq-based shuffle function.
It might be a case when length of the given list is already known, and doesn't need to be calculated again. Thus, it might be considered as an optimisation.
Besides, in general, the resulting list doesn't need to have the same size as the original one. Thus, this argument could be used for setting this length.
This is true for the original idea of Oleg (source - http://okmij.org/ftp/Haskell/perfect-shuffle.txt):
-- examples
t1 = shuffle1 ['a','b','c','d','e'] [0,0,0,0]
-- "abcde"
-- Note, that rseq of all zeros leaves the sequence unperturbed.
t2 = shuffle1 ['a','b','c','d','e'] [4,3,2,1]
-- "edcba"
-- The rseq of (n-i | i<-[1..n-1]) reverses the original sequence of elements
However, it's not the same for the 'random-shuffle' package implementation:
> shuffle [0..10] [0,0,0,0]
[0,1,2,3random-shuffle.hs: [shuffle] called with lists of different lengths
I think it worth to follow-up with the packages maintainers in order to understand the contract of this function.
Is there an exposed function for removing an item from a list?
I do not see any operation for removing an item from a list.
I'm sure I can implement this functionality on my own. However, I kind of expected this operation to be supported in FSharp.Core.
Am I missing something?
If you mean creating a new list with some items removed based on their value, then you could do this:
[1; 2; 3; 1] |> List.filter ((<>) 1)
// Returns [2; 3]
This uses the <> (not equal) operator in prefix mode by wrapping it in parentheses and then currying it by only providing the first argument.
Note that all instances of this value are excluded.
Short answer - the library designers didn't think it warranted inclusion.
Designing a library of any sort, but in particular a core library like collection modules in F#, is always about finding the right balance between complexity and usefulness. You have to carefully consider if your new feature brings enough to the table to offset the cost of having a larger library.
For removing all instances of an item, you can use List.filter with a negated predicate. The designers could have included a List.remove function that does the negation internally. It's not something unthinkable, in fact Lisps tend to have both filter and remove. In Haskell and OCaml, you only have filter though - and F# designers probably followed suit here.
If you want to remove only a single instance of an item, you have to write something yourself. This is a somewhat non-standard use case for a list - lists are "about" accumulating elements in sequence; removing particular elements from the middle of the list (as opposed to removing the head or removing all undesirable elements) is seldom useful. If your focus is on adding or removing elements without a need to preserve order, sets or maps (used as multisets) are a better fit for job.
I'm not sure why you would expect this. For example the same functionality (AFAIK) is not available for Arrays in C#.
However if you want you can use the generic List:
open System.Collections.Generic
let xs = [1..3]
let xs' = List(xs)
xs'.Remove(2)
xs'
//val it : List<int> = seq [1; 3]
The Generic List has .Remove, .RemoveAt and .RemoveAll methods.
I am making a program, and an algorithm I have thought to use requires a cheap way of accessing a list backwards to be effective. Is there an effective way to access a list from the last element forward? Or, because I think that might be impossible due to the structure of SML lists, is there an effective data structure to achieve it?
The length of data is unknown before executing, and there is no need for other than serial traversing of the data.
I think you want a functional deque. See e.g. Okasaki's paper on the subject. Specifically, Figure 5 shows an implementation of deques.
If using a functional deque seems like overkill and you need to traverse the list in reverse order just once, then solutions that e.g. use List.last and List.take to emulate hd and tl but in reverse order are, as you seem to know, bad because they would make the list traversal quadratic. On the other hand, the built in function rev is very efficient since it is both tail-recursive and linear. If you feed a list to a function that needs to traverse that list in reverse order, an easy solution is to use a let binding using rev to create a local copy of the list in reverse order and then traverse the reversed list in the usual way.
I understand that lists are implemented as singly linked so they don't really have a constant structure that you can pin a length on, but each node should know how many nodes till the last element right? There isn't a way to add a node to some existing list and for that node not to be able to determine the length of the list it represents in constant time provided that the existing nodes already have that info.
I can understand why that wouldn't work in Haskell, for example, due to lazyness, but as far as I know F# lists aren't lazy. So, is the problem just in the extra memory overhead?
Seems to me like typical memory vs time performance consideration.
If standard f# list had the implementation You suggest, then it would need much more place in memory (consider one million long list of bools). And everyone using such list would have to deal with it. There would be no simple way to opt out of this other than writing completely new implementation of list.
On the other hand, it seems to be fairly simple to create a new type that would store length of succeeding list with each element basing on F# List. You can implement it on Your own if You need it. Those, who don't need it will use standard implementation.
I don't often find myself needing to know the length of the list, it's not like you need it to exit a for loop like you would with arrays in imperative languages.
For those rare cases when you really need to know the length asap, you can go with Carsten König's suggestion from a comment and make your 'a list into a ('a * int) list, where each node keeps the length of the tail as a tuple element.
Then you can have something like this:
let push lst e =
match lst with
| (_, c)::_ -> (e, c + 1) :: lst
| [] -> [e, 0]
and length and pop functions to match.
For all the other cases I'd call it a premature optimization.