Traversing a list until certain criterion is met - list

I would like to create a simple SML program that traverses a list from left to right.Let's say I have a list of N items of K different types.For example the list 1 3 1 3 1 3 3 2 2 1 has 10 numbers of 3(1,2,3) types.
What I would like to to is go through this list from left to right and stop when i have found all K different numbers.In this case I would stop right after stumbling upon the first 2.
This could be done by spliting the list in head and tail in each step and processing the head element.However how could I keep track of the different numbers I have found?
This could be done in C/C++ by simply holding a counter and a boolean array with K elements. If i stumble upon an element i with bool[i]=false i make it true and counter=counter+1.
It is stated though that arrays are not the best option for SML so i was wondering if i have to use another data structure or if i have to create a new function to check each time if i have seen this element before(this would cost in time complexity).

how could I keep track of the different numbers I have found?
[...] in C/C++ by [...] a boolean array with K elements
Abstractly I would call the data structure you want a bit set.
I'll give you two answers, one using a sparse container and one using a bit set.
Sparse
I'd use a list to keep track of the elements you've already seen:
fun curry f x y = f (x, y)
val empty = []
fun add x set = curry op:: x set
fun elem x set = List.exists (curry op= x) set
fun seen k xs =
let fun seen_ 0 _ _ = true
| seen_ _ [] _ = false
| seen_ k (x::xs) set =
if elem x set
then seen_ k xs set
else seen_ (k-1) xs (add x set)
in seen_ k xs empty end
You could also use a balanced binary tree as set type; this would reduce lookup to O(lg n). The advantage of using an actual container (list or tree) rather than a bit array is that of sparse arrays/matrices. This works for ''a lists.
Bit set
[...] boolean array with K elements [...]
If i stumble upon an element i [...]
Until this point, you haven't said that elements are always unsigned integers from 0 to K-1, which would be a requirement if they should be representable by a unique index in an array of length K.
SML has a module/type called Word / word for unsigned integers (words). Adding this constraint, the input list should have type word list rather than ''a list.
When you make an array of primitive types in many imperative, compiled languages, you get mutable, unboxed arrays. SML's Array type is also mutable, but each bool in such an array would be boxed.
An easy way to get an immutable, unboxed array of bits would be to use bitwise operations on an IntInf (SML/NJ; implementations vary); it would automatically grow as a bit is flipped. This could look like:
fun bit x = IntInf.<< (1, x)
val empty = IntInf.fromInt 0
fun add x set = IntInf.orb (set, bit x)
fun elem x set = IntInf.> (IntInf.andb (set, bit x), 0)
The function seen would be the same.
The fact that k is decreased recursively and that set grows dynamically means that you're not restricted to elements in the range [0,K-1], which would have been the case with an array of size K.
Example use:
- seen 5 [0w4, 0w2, 0w1, 0w9];
val it = false : bool
- seen 5 [0w1, 0w2, 0w3, 0w4, 0w8];
val it = true : bool
This solution uses a lot of memory if the elements are large:
- seen 1 [0w100000000];
*eats my memory slowly*
val it = true : bool
Additional things you could do:
Create a module, structure BitSet = struct ... end that encapsulates an abstract type with the operations empty, add and elem, hiding the particular implementation (whether it's an IntInf.int, or a bool Array.array or an ''a list).
Create a function, fun fold_until f e xs = ... that extracts the recursion scheme of seen_ so that you avoid manual recursion; a regular foldl is not enough since it continues until the list is empty. You could build this using error-aware return type or using exceptions.
Consider Bloom filters.

Related

Finding two max in list

How do I find two max value in a list and sum up, not using rec, only can use List.fold_left or right and List.map?
I used filter, but it's not allowed, anyways I can replace the filter?
let max a b =
if b = 0 then a
else if a > b then a
else b;;
let maxl2 lst =
match lst with
| [] -> 0
| h::t ->
let acc = h in
List.fold_left max acc lst +
List.fold_left
max acc
(List.filter (fun x -> (x mod List.fold_left max acc lst) != 0) lst);;
List.fold_left is very powerful and can be used to implement List.filter, List.map, List.rev and so on. So it's not much of a restriction. I would assume the purpose of the exercise is for you to learn about the folds and what they can do.
If your solution with List.filter actually works, you should be able to replace List.filter by one you wrote yourself using List.fold_left. The basic idea of a fold is that it builds up a result (of any type you choose) by looking at one element of the list at a time. For filter, you would add the current element to the result if it passes the test.
However I have to wonder whether your solution will work even with List.filter. I don't see why you're using mod. It doesn't make a lot of sense. You seem to need an equality test (= in OCaml). You can't use mod as an equality test. For example 28 mod 7 = 0 but 28 <> 7.
Also your idea of filtering out the largest value doesn't seem like it would work if the two largest values were equal.
My advice is to use List.fold_left to maintain the two largest values you've seen so far. Then add them up at the end.
To build on what Jeffrey has said, List.fold_left looks at one element in a list at a time and an accumulator. Let's consider a list [1; 3; 7; 0; 6; 2]. An accumulator that makes sense is a tuple with the first element being the largest and the second element representing the second largest. We can initially populate these with the first two elements.
The first two elements of this list are [1; 3]. Finding the max of that we can turn this into the tuple (3, 1). The remainder of the list is [7; 0; 6; 2].
First we consider 7. It's bigger than 3, so we change the accumulator to (7, 3). Next we consider 0. This is smaller than both elements of the accumulator, so we make no changes. Next: 6. This is bigger than 3 but smaller than 7, so we updated the accumulator to (7, 6). Next: 2 which is smaller than both, so no change. The resulting accumulator is (7, 6).
Actually writing the code for this is your job.
Often, functions called by fold use an accumulator that is simple enough to be stored as an anonymous tuple. But this can become hard to understand when you are dealing with complex behaviors: you have to consider different corner cases, like what is the initial accumulator value? what is the regular behavior of the function, ie. when the accumulator has encountered enough values? what happens before that?
For example here you have to keep track of two maximal values (the general case), but your code has a build-up phase where there is only one element being visited in the list, and starts with initially no known max value. This kind of intermediate states is IMO the hardest part of using fold (the more pleasant cases are when the accumulator and list elements are of the same type).
I'd recommend making it very clear what type the accumulator is, and write as many helper functions as possible to clear things up.
To that effect, let's define the accumulator type as follows, with all different cases treated explicitly:
type max_of_acc =
| SortedPair of int * int (* invariant: fst <= snd *)
| Single of int
| Empty
Note that this isn't the only way to do it, you could keep a list of maximum values, initially empty, always sorted, and of size at most N, for some N (and then you would solve a more general case, ie. a list of N highest values). But as an exercise, it helps to cover the different cases as above.
For example, at some point you will need to compute the sum of the max values.
let sum_max_of m = match m with
| Empty -> 0
| Single v -> v
| SortedPair (u,v) -> u+v;;
I would also define the following helper function:
let sorted_pair u v = if u <= v then SortedPair (u,v) else SortedPair (v, u)
Finally, your function would look like this:
let fold_max_of acc w = match acc with
| Empty -> ...
| Single v -> ...
| SortedPair (u, v) -> ...
And could be used in the following way:
# List.fold_left fold_max_of Empty [1;2;3;5;4];;
- : max_of = SortedPair (4, 5)

Filtering lists which have the same number of different elements in them in Haskell

I am pretty new to Haskell and I have the data data Instruction = Add | Sub | Mul | Div | Dup | Pop deriving (Eq,Ord,Show,Generic) and I am generating lists with all possible combinations of Mul and Dup with mapM (const [Mul, Dup]) [1..n]) of size n.
I wanted only the lists starting with Dup and ending with Mul so I used filter((== Mul) . last)(filter((== Dup) . head) (mapM (const [Mul, Dup]) [1..n])) but I also want only the lists with the same number of Mul and Dup in them but I can't seem to come up with a way of doing this. How do I filter this and is there a more efficient way of doing this as there may be a huge amount of combinations as lists get bigger?
A sample list would look like this: [Dup,Mul,Dup,Mul] and [Dup,Dup,Mul,Mul] for lists of size 4.
While your approach is correct, I think it's not the most efficient one. You generate 2^N lists and then filter out many of them. Forgetting the other requirements to keep the counting simple, by requiring that we have as many Muls as Dups, we end up with only choose(N, N/2) lists (the number of subsets of size N/2 of 1..N), which is a much smaller figure.
We can instead try to avoid the filtering and generate the wanted lists, only, in the first place. I suggest the following approach, which you can modify as needed to satisfy the other requirements.
We define a function sameMulDup which takes two integers m and d and generates all the lists with m Muls and d Dups.
sameMulDup :: Int -> Int -> [[Instruction]]
sameMulDup 0 d = [replicate d Dup]
sameMulDup m 0 = [replicate d Mul]
sameMulDup m d = do
-- generate the first element
x <- [Dup, Mul]
-- compute how many m and d we have left
let (m', d') = case x of
Dup -> (m , d-1)
Mul -> (m-1, d )
-- generate the other elements
xs <- sameMulDup m' d'
return (x:xs)
Intuitively, if d=0 or m=0 there is only one possible list to include in out list-of-lists result. Otherwise, we non deterministically pick the first element, decrement the correponding counter d or m, and generate the rest.
Alternatively, the last equation can be replaced by the following more basic one:
sameMulDup m d =
map (Dup:) (sameMulDup m (d-1))
++
map (Mul:) (sameMulDup (m-1) d)
Anyway, given sameMuldup, you should be able to solve your full task.
It should be possible to define a function countPred :: a -> [a] -> Int, which counts the number of items in the list which are equal to the first argument; you can then do filter (\l -> countPred Mul l == countPred Dup l) (or alternately filter ((==) <$> countPred Mul <*> countPred Dup) if you prefer point-free form). Another approach I suppose might be to do (==0) . sum . map (\case { Mul -> 1, Dup -> (-1) }), but that strikes me as being slightly more complex than necessary.
I like chi's answer, but in a comment, I mentioned that it doesn't achieve as much sharing as it could. I speculated that the sharing would be beneficial if you iterate over the list of instructions multiple times, but worse if you iterate just once. Empirically, the sharing version appears to be faster no matter how many times you iterate, but the memory tradeoff is as predicted: worse for one iteration, better for multiple. So I thought it might be interesting to show it.
Here's how it looks. We're going to make an infinite list of answers. The first index will be how long the list of instructions will be; the second is how many Muls there are (though I'll use True and False instead of Mul and Dup). So:
bits :: [[[[Bool]]]]
bits = iterate extend [[[]]] where
extend bsss = zipWith (++)
(map (map (False:)) bsss ++ [[]])
([[]] ++ map (map (True:)) bsss)
For completeness, here's how you write a function with the same signature as chi's sameMulDup, and computing the same answer (up to the swap to Bool):
sameMulDup' :: Int -> Int -> [[Bool]]
sameMulDup' m d = bits !! (m+d) !! m
Some timings on my machine, for m=d=12, when compiled -O2:
sameMulDup , one iteration 1.35s 6480Kb
sameMulDup', one iteration 1.11s 226476Kb
sameMulDup , two iterations 4.26s 2135368Kb
sameMulDup', two iterations 1.97s 620880Kb
Here is the driver code I used for acquiring these numbers:
main :: IO ()
main = do
[sharing, twice, m, d] <- getArgs
let answer = (if read sharing then sameMulDup' else sameMulDup) (read m) (read d)
if read twice
then do
print . sum . map (sum . map fromEnum) $ answer
print . sum . map (sum . map (fromEnum . not)) $ answer
else print . sum . map (sum . map fromEnum) $ answer
There are some subtle points here:
To iterate over the list twice, we must have a way of referring to the same list in both iterations. This is answer in the above code.
We must use an iteration that actually forces all the values for it to be useful. I do this by counting up how many Trues there are, but there are other ways. (Just printing the whole list doesn't work well: the calculation's runtime is then dwarfed by the production of the String to print and the work done in transferring it to the terminal.)
Although the first iteration uses the same code in both branches of the if, it is important that this code not be shared and moved out of the if. We want the compiler to know in the else branch that answer will not be used again, so that it may garbage collect. If you write print answer >> if twice then print answer else pure (), it is not as obvious statically when the prefix of answer may be garbage collected.
In the then branch, I used two different calculations in the two loops, so that the compiler did not attempt to get clever and do the calculation just once and then print the calculated result twice.

Function which outputs a list of factors

For an assignment I need to create a function which takes a list of Ints and outputs all of a number's factors in a new list. Thing is, I have absolutely no idea how to do this. I know its signature needs to be like this though :
factors :: [Int] -> [[Int]]
factors xs = ???
So when you take a list like this : [2,5,7,8]
It outputs [[],[],[],[2,4]]
I have tried things with map, filter, mod, list comprehension or higher order functions, but since this is the first language I am learning, it's very hard for me to come up with any sort of solution.
So the first thing to do if we get stuck is to skip the programming part of the problem and start by solving the actual problem. We want to take 1 number, get the factors of that number, wrap the factors inside a list, and keep going until there are no more numbers to factor.
So how do we get the factors of a number? A number x is a factor of y if we can write y as a product of x and some other integer z. Therefor, 2 is a factor of 8 because 8 can be written as 2*4.
Using this information we also know that 8 must be divisble by 2 without rest, which it is. Great! So know we know that for any two integers x and y, if x is divisible by y without rest, y is a factor.
Lets go to haskell and try some approach with the information : " x is a factor of y if y is divided by x with no rest"
factors :: Int -> [Int]
factors y = [ x | x <- [1..y], y `mod` x == 0]
So, using a listcomp we can wrap all x:es from [1..y] and put them in a list, but if and only if
y 'mod' that specific x equals 0.
If we have a function to create a list with all the factors of one number, what if we just map that function to a list of numbers, and wrap the resulting lists in a new list, and return that list
listFactors :: [Int] -> [[Int]]
listFactors xs = map factors xs
If we do not want to show the multiplication identity 1 or the number itself we can just change the interval to [2..y-1]

Java list: get amount of Pairs with pairwise different Keys using lambda expressions

I have a list of key-value-pairs and I want to filter a list where every key parameter only occurs once.
So that a list of e.g. {Pair(1,2), Pair(1,4), Pair(2,2)} becomes {Pair(1,2), Pair(2,2)}.
It doesn't matter which Pair gets filtered out as I only need the size
(maybe there's a different way to get the amount of pairs with pairwise different key values?).
This all is again happening in another stream of an array of lists (of key-value-pairs) and they're all added up.
I basically want the amount of collisions in a hashmap.
I hope you understand what I mean; if not please ask.
public int collisions() {
return Stream.of(t)
.filter(l -> l.size() > 1)
.filter(/*Convert l to list of Pairs with pairwise different Keys*/)
.mapToInt(l -> l.size() - 1)
.sum();
}
EDIT:
public int collisions() {
return Stream.of(t)
.forEach(currentList = stream().distinct().collect(Collectors.toList())) //Compiler Error, how do I do this?
.filter(l -> l.size() > 1)
.mapToInt(l -> l.size() - 1)
.sum();
}
I overwrote the equals of Pair to return true if the Keys are identical so now i can use distinct to remove "duplicates" (Pairs with equal Keys).
Is it possible to, in forEach, replace the currentElement with the same List "distincted"? If so, how?
Regards,
Claas M
I'm not sure whether you want the sum of amount of collisions per list or the amount of collisions in all list were merged into a single one before. I assumed the former, but if it's the latter the idea does not change by much.
This how you could do it with Streams:
int collisions = Stream.of(lists)
.flatMap(List::stream)
.mapToInt(l -> l.size() - (int) l.stream().map(p -> p.k).distinct().count())
.sum();
Stream.of(lists) will give you a Stream<List<List<Pair<Integer, Integer>> with a single element.
Then you flatMap it, so that you have a Stream<List<Pair<Integer, Integer>>.
From there, you mapToInt each list by substracting its original size with the number of elements of unique Pairs by key it contained (l.stream().map(p -> p.k).distinct().count()).
Finally, you call sum to have the overall amount of collisions.
Note that you could use mapToLong to get rid of the cast but then collisions has to be a long (which is maybe more correct if each list has a lot of "collisions").
For example given the input:
List<Pair<Integer, Integer>> l1 = Arrays.asList(new Pair<>(1,2), new Pair<>(1,4), new Pair<>(2,2));
List<Pair<Integer, Integer>> l2 = Arrays.asList(new Pair<>(2,2), new Pair<>(1,4), new Pair<>(2,2));
List<Pair<Integer, Integer>> l3 = Arrays.asList(new Pair<>(3,2), new Pair<>(3,4), new Pair<>(3,2));
List<List<Pair<Integer, Integer>>> lists = Arrays.asList(l1, l2, l3);
It will output 4 as you have 1 collision in the first list, 1 in the second and 2 in the third.
Don't use a stream. Dump the list into a SortedSet with a custom comparator and diff the sizes:
List<Pair<K, V>> list; // given this
Set<Pair<K, V>> set = new TreeSet<>(list, (a, b) -> a.getKey().compareTo(b.getKey())).size();
set.addAll(list);
int collisions = list.size() - set.size();
If the key type isn't comparable, alter the comparator lambda accordingly.

Append integer to global list inside function haskell

I'll use a simple example for what I'm trying to do.
Say I have the list:
nums = []
Now I have the function:
allNums n = nums.append(n)
So if I run the function:
allNums 6
The list nums should have the values
[6]
I know nums.append doesn't work, but what code could replace that.
Simple Answer:
You can't do that. Haskell is a pure, functional language, that means:
A function does not have any side effect.
A function does always return the same result when called with the same parameters.
A function may or may not be called, but you don't have to care about that. If it wasn't called, it wasn't needed, but because the function does not have any side effects, you won't find out.
Complex answer:
You could use the State Monad to implement something that behaves a bit like this, but this is probably out of reach for you yet.
I'm suggesting to use an infinite list instead of appending to global variable.
It's true haskell is pure functional. But also it's lazy. Every part of data is not calculated until is really needed. It also applies to collections. So you could even define a collection with elements based on previous elements of same collection.
Consider following code:
isPrime n = all (\p -> (n `mod` p) /= 0 ) $ takeWhile (\p ->p * p <= n) primes
primes = 2 : ( filter isPrime $ iterate (+1) 3 )
main = putStrLn $ show $ take 100 primes
definition of isPrime is trivia when primes list is defined. It takes pack of primes which is less or equivalent to square root of examining number
takeWhile (\p ->p * p <= n) primes
then it checks if number have only non-zero remainders in division by all of these numbers
all (\p -> (n `mod` p) /= 0 )
the $ here is an application operator
Next using this definition we taking all numbers starting from 3:
iterate (+1) 3
And filtering primes from them.
filter isPrime
Then we just prepending the first prime to it:
primes = 2 : ( ... )
So primes becomes an infinite self-referred list.
You may ask: why we prepending 2 and just no starting filtering numbers from it like:
primes = filter isPrime $ iterate (+1) 2
You could check this leads to uncomputable expression because the isPrime function needs at least one known member of primes to apply the takeWhile to it.
As you can see primes is well defined and immutable while it could have as many elements as you'll need in your logic.