Related
I'm a Haskell beginner trying to learn more about the language by solving some online quizzes/problem sets.
The problem/question is quite lengthy but a part of it requires code that can find the number which divides a given list into two (nearly) equal (by sum) sub-lists.
Given [1..10]
Answer should be 7 since 1+2+..7 = 28 & 8+9+10 = 27
This is the way I implemented it
-- partitions list by y
partishner :: (Floating a) => Int -> [a] -> [[[a]]]
partishner 0 xs = [[xs],[]]
partishner y xs = [take y xs : [drop y xs]] ++ partishner (y - 1) xs
-- finds the equal sum
findTheEquilizer :: (Ord a, Floating a) => [a] -> [[a]]
findTheEquilizer xs = fst $ minimumBy (comparing snd) zipParty
where party = (tail . init) (partishner (length xs) xs) -- removes [xs,[]] types
afterParty = (map (\[x, y] -> (x - y) ** 2) . init . map (map sum)) party
zipParty = zip party afterParty -- zips partitions and squared diff betn their sums
Given (last . head) (findTheEquilizer [1..10])
output : 7
For numbers near 50k it works fine
λ> (last . head) (findTheEquilizer [1..10000])
7071.0
The trouble starts when I put in lists with any more than 70k elements in it. It takes forever to compute.
So what do I have to change in the code to make it run better or do I have to change my whole approach? I'm guessing it's the later, but I'm not sure how to go about do that.
It looks to me that the implementation is quite chaotic. For example partishner seems to construct a list of lists of lists of a, where, given I understood it correctly, the outer list contains lists with each two elements: the list of elements on "the left", and the list of elements at the "right". As a result, this will take O(n2) to construct the lists.
By using lists over 2-tuples, this is also quite "unsafe", since a list can - although here probably impossible - contain no elements, one element, or more than two elements. If you make a mistake in one of the functions, it will be hard to find out that mistake.
It looks to me that it might be easier to implement a "sweep algorithm": we first calculate the sum of all the elements in the list. This is the value on the "right" in case we decide to split at that specific point, next we start moving from left to right, each time subtracting the element from the sum on the right, and adding it to the sum on the left. We can each time evaluate the difference in score, like:
import Data.List(unfoldr)
sweep :: Num a => [a] -> [(Int, a, [a])]
sweep lst = x0 : unfoldr f x0
where x0 = (0, sum lst, lst)
f (_, _, []) = Nothing
f (i, r, (x: xs)) = Just (l, l)
where l = (i+1, r-2*x, xs)
For example:
Prelude Data.List> sweep [1,4,2,5]
[(0,12,[1,4,2,5]),(1,10,[4,2,5]),(2,2,[2,5]),(3,-2,[5]),(4,-12,[])]
So if we select to split at the first split point (before the first element), the sum on the right is 12 higher than the sum on the left, if we split after the first element, the sum on the right (11) is 10 higher than the sum on the left (1).
We can then obtain the minimum of these splits with minimumBy :: (a -> a -> Ordering) -> [a] -> a:
import Data.List(minimumBy)
import Data.Ord(comparing)
findTheEquilizer :: (Ord a, Num a) => [a] -> ([a], [a])
findTheEquilizer lst = (take idx lst, tl)
where (idx, _, tl) = minimumBy (comparing (abs . \(_, x, _) -> x)) (sweep lst)
We then obtain the correct value for [1..10]:
Prelude Data.List Data.Ord Data.List> findTheEquilizer [1..10]
([1,2,3,4,5,6,7],[8,9,10])
or for 70'000:
Prelude Data.List Data.Ord Data.List> head (snd (findTheEquilizer [1..70000]))
49498
The above is not ideal, it can be implemented more elegantly, but I leave this as an exercise.
Okay, firstly, let analyse why it run forever (...actually not forever, just slow), take a look of partishner function:
partishner y xs = [take y xs : [drop y xs]] ++ partishner (y - 1) xs
where take y xs and drop y xs are run linear time, i.e. O(N), and so as
[take y xs : [drop y xs]]
is O(N) too.
However, it is run again and again in recursive way over each element of given list. Now suppose the length of given list is M, each call of partishner function take O(N) times, to finish computation need:
O(1+2+...M) = (M(1+M)/2) ~ O(M^2)
Now, the list has 70k elements, it at least need 70k ^ 2 step. So why it hang.
Instead of using partishner function, you can sum the list in linear way as:
sumList::(Floating a)=>[a]->[a]
sumList xs = sum 0 xs
where sum _ [] = []
sum s (y:ys) = let s' = s + y in s' : sum s' ys
and findEqilizer just sum the given list from left to right (leftSum) and from right to left (rightSum) and take the result just as your original program, but the whole process just take linear time.
findEquilizer::(Ord a, Floating a) => [a] -> a
findEquilizer [] = 0
findEquilizer xs =
let leftSum = reverse $ 0:(sumList $ init xs)
rightSum = sumList $ reverse $ xs
afterParty = zipWith (\x y->(x-y) ** 2) leftSum rightSum
in fst $ minimumBy (comparing snd) (zip (reverse $ init xs) afterParty)
I assume that none of the list elements are negative, and use a "tortoise and hare" approach. The hare steps through the list, adding up elements. The tortoise does the same thing, but it keeps its sum doubled and it carefully ensures that it only takes a step when that step won't put it ahead of the hare.
approxEqualSums
:: (Num a, Ord a)
=> [a] -> (Maybe a, [a])
approxEqualSums as0 = stepHare 0 Nothing as0 0 as0
where
-- ht is the current best guess.
stepHare _tortoiseSum ht tortoise _hareSum []
= (ht, tortoise)
stepHare tortoiseSum ht tortoise hareSum (h:hs)
= stepTortoise tortoiseSum ht tortoise (hareSum + h) hs
stepTortoise tortoiseSum ht [] hareSum hare
= stepHare tortoiseSum ht [] hareSum hare
stepTortoise tortoiseSum ht tortoise#(t:ts) hareSum hare
| tortoiseSum' <= hareSum
= stepTortoise tortoiseSum' (Just t) ts hareSum hare
| otherwise
= stepHare tortoiseSum ht tortoise hareSum hare
where tortoiseSum' = tortoiseSum + 2*t
In use:
> approxEqualSums [1..10]
(Just 6,[7,8,9,10])
6 is the last element before going over half, and 7 is the first one after that.
I asked in the comment and OP says [1..n] is not really defining the question. Yes i guess what's asked is like [1 -> n] in random ascending sequence such as [1,3,7,19,37,...,1453,...,n].
Yet..! Even as per the given answers, for a list like [1..n] we really don't need to do any list operation at all.
The sum of [1..n] is n*(n+1)/2.
Which means we need to find m for n*(n+1)/4
Which means m(m+1)/2 = n*(n+1)/4.
So if n == 100 then m^2 + m - 5050 = 0
All we need is
formula where a = 1, b = 1 and c = -5050 yielding the reasonable root to be 70.565 ⇒ 71 (rounded). Lets check. 71*72/2 = 2556 and 5050-2556 = 2494 which says 2556 - 2494 = 62 minimal difference (<71). Yes we must split at 71. So just do like result = [[1..71],[72..100]] over..!
But when it comes to not subsequent ascending, that's a different animal. It has to be done by first finding the sum and then like binary search by jumping halfway the list and comparing the sums to decide whether to jump halfway back or forward accordingly. I will implement that one later.
Here's a code which is empirically behaving better than linear, and gets to the 2,000,000 in just over 1 second even when interpreted:
g :: (Ord c, Num c) => [c] -> [(Int, c)]
g = head . dropWhile ((> 0) . snd . last) . map (take 2) . tails . zip [1..]
. (\xs -> zipWith (-) (map (last xs -) xs) xs) . scanl1 (+)
g [1..10] ==> [(6,13),(7,-1)] -- 0.0s
g [1..70000] ==> [(49497,32494),(49498,-66502)] -- 0.09s
g [70000,70000-1..1] ==> [(20502,66502),(20503,-32494)] -- 0.09s
g [1..100000] ==> [(70710,75190),(70711,-66232)] -- 0.11s
g [1..1000000] ==> [(707106,897658),(707107,-516556)] -- 0.62s
g [1..2000000] ==> [(1414213,1176418),(1414214,-1652010)] -- 1.14s n^0.88
g [1..3000000] ==> [(2121320,836280),(2121321,-3406362)] -- 1.65s n^0.91
It works by running the partial sums with scanl1 (+) and taking the total sum as its last, so that for each partial sum, subtracting it from the total gives us the sum of the second part of the split.
The algorithm assumes all the numbers in the input list are strictly positive, so the partial sums list is monotonically increasing. Nothing else is assumed about the numbers.
The value must be chosen from the pair (the g's result) so that its second component's absolute value is the smaller between the two.
This is achieved by minimumBy (comparing (abs . snd)) . g.
clarifications: There's some confusion about "complexity" in the comments below, yet the answer says nothing at all about complexity but uses a specific empirical measurement. You can't argue with empirical data (unless you misinterpret its meaning).
The answer does not claim it "is better than linear", it says "it behaves better than linear" [in the tested range of problem sizes], which the empirical data incontrovertibly show.
Finally, an appeal to authority. Robert Sedgewick is an authority on algorithms. Take it up with him.
(and of course the algorithm handles unordered data as well as it does ordered).
As for the reasons for OP's code inefficiency: map sum . inits can't help being quadratic, but the equivalent scanl (+) 0 is linear. The radical improvement comes about from a lot of redundant calculations in the former being avoided in the latter. (Another example of this can be seen here.)
I'd like to sort a list of tuples by the third or fourth element (say c or d) in the list of type:
myList = [(a,b,c,d,e)]
I know if the tuple is of type (a,b) I can use the following approach:
mySort xmyList = sortBy (compare `on` snd) x
But the type of sortBy will not work on tuples with length greater than two (so obviously there is no point writing an accessor function for thd or fth) :
(a -> a -> Ordering) -> [a] -> [a]
But the type of sortBy will not work on tuples with length greater
than two
No, it actually works for any list. For instance, if you want to sort on c, just do:
mySort xmyList = sortBy (compare `on` (\(a,b,c,d,e) -> c)) x
or
mySort xmyList = sortBy (comparing thirdOf5) x
where thirdOf5 (_,_,c,_,_) = c
I'm trying to write a function that given a list of numbers, returns a list where every 2nd number is doubled in value, starting from the last element. So if the list elements are 1..n, n-th is going to be left as-is, (n-1)-th is going to be doubled in value, (n-2)-th is going to be left as-is, etc.
So here's how I solved it:
MyFunc :: [Integer] -> [Integer]
MyFunc xs = reverse (MyFuncHelper (reverse xs))
MyFuncHelper :: [Integer] -> [Integer]
MyFuncHelper [] = []
MyFuncHelper (x:[]) = [x]
MyFuncHelper (x:y:zs) = [x,y*2] ++ MyFuncHelper zs
And it works:
MyFunc [1,1,1,1] = [2,1,2,1]
MyFunc [1,1,1] = [1,2,1]
However, I can't help but think there has to be a simpler solution than reversing the list, processing it and then reversing it again. Could I simply iterate the list backwards? If yes, how?
The under reversed f xs idiom from the lens library will apply f to xs in reverse order:
under reversed (take 5) [1..100] => [96,97,98,99,100]
When you need to process the list from the end, usually foldr works pretty well. Here is a solution for you without reversing the whole list twice:
doubleOdd :: Num a => [a] -> [a]
doubleOdd = fst . foldr multiplyCond ([], False)
where multiplyCond x (rest, flag) = ((if flag then (x * 2) else x) : rest, not flag)
The multiplyCond function takes a tuple with a flag and the accumulator list. The flag constantly toggles on and off to track whether we should multiply the element or not. The accumulator list simply gathers the resulting numbers. This solution may be not so concise, but avoids extra work and doesn't use anything but prelude functions.
myFunc = reverse
. map (\(b,x) -> if b then x*2 else x)
. zip (cycle [False,True])
. reverse
But this isn't much better. Your implementation is sufficiently elegant.
The simplest way to iterate the list backwards is to reverse the list. I don't think you can really do much better than that; I suspect that if you have to traverse the whole list to find the end, and remember how to get back up, you might as well just reverse it. If this is a big deal, maybe you should be using some other data structure instead of lists—Vector or Seq might be good choices.
Another way to write your helper function is to use Traversable:
import Control.Monad.State
import Data.Traversable (Traversable, traverse)
toggle :: (Bool -> a -> b) -> a -> State Bool b
toggle f a =
do active <- get
put (not active)
return (f active a)
doubleEvens :: (Num a, Traversable t) => t a -> t a
doubleEvens xs = evalState (traverse (toggle step) xs) False
where step True x = 2*x
step False x = x
yourFunc :: Num a => [a] -> [a]
yourFunc = reverse . doubleEvens
Or if we go a bit crazy with Foldable and Traversable, we can try this:
Use Foldable's foldl to extract a reverse-order list from any of its instances. For some types this will be more efficient than reversing a list.
Then we can use traverse and State to map each element of the original structure to its counterpart in the reversed order.
Here's how to do it:
import Control.Monad.State
import Data.Foldable (Foldable)
import qualified Data.Foldable as F
import Data.Traversable (Traversable, traverse)
import Data.Map (Map)
import qualified Data.Map as Map
toReversedList :: Foldable t => t a -> [a]
toReversedList = F.foldl (flip (:)) []
reverse' :: Traversable t => t a -> t a
reverse' ta = evalState (traverse step ta) (toReversedList ta)
where step _ = do (h:t) <- get
put t
return h
yourFunc' :: (Traversable t, Num a) => t a -> t a
yourFunc' = reverse' . doubleEvens
-- >>> yourFunc' $ Map.fromList [(1, 1), (2, 1), (3, 1), (4, 1)]
-- fromList [(1,2),(2,1),(3,2),(4,1)]
-- >>> yourFunc' $ Map.fromList [(1, 1), (2, 1), (3, 1)]
-- fromList [(1,1),(2,2),(3,1)]
There's probably a better way to do this, though...
func xs = zipWith (*) xs $ reverse . (take $ length xs) $ cycle [1,2]
So i've been praticing Haskell, and i was doing just fine, until i got stuck in this exercise. Basically i want a function that receives a list like this :
xs = [("a","b"),("a","c"),("b","e")]
returns something like this :
xs = [("a",["b","c"]), ("b",["e"])].
I come up with this code:
list xs = [(a,[b])|(a,b) <- xs]
but the problem is that this doesn't do what i want. i guess it's close, but not right.
Here's what this returns:
xs = [("a",["b"]),("a",["c"]),("b",["e"])]
If you don't care about the order of the tuples in the final list, the most efficient way (that doesn't reinvent the wheel) would be to make use of the Map type from Data.Map in the containers package:
import Data.Map as Map
clump :: Ord a => [(a,b)] -> [(a, [b])]
clump xs = Map.toList $ Map.fromListWith (flip (++)) [(a, [b]) | (a,b) <- xs]
main = do print $ clump [("a","b"),("a","c"),("b","e")]
If you do care about the result order, you'll probably have to do something ugly and O(n^2) like this:
import Data.List (nub)
clump' :: Eq a => [(a,b)] -> [(a, [b])]
clump' xs = [(a, [b | (a', b) <- xs, a' == a]) | a <- nub $ map fst xs]
main = do print $ clump' [("a","b"),("a","c"),("b","e")]
You could use right fold with Data.Map.insertWith:
import Data.Map as M hiding (foldr)
main :: IO ()
main = print . M.toList
$ foldr (\(k, v) m -> M.insertWith (++) k [v] m)
M.empty
[("a","b"),("a","c"),("b","e")]
Output:
./main
[("a",["b","c"]),("b",["e"])]
The basic principle is that you want to group "similar" elements together.
Whenever you want to group elements together, you have the group functions in Data.List. In this case, you want to specify yourself what counts as similar, so you will need to use the groupBy version. Most functions in Data.List have a By-version that lets you specify more in detail what you want.
Step 1
In your case, you want to define "similarity" as "having the same first element". In Haskell, "having the same first element on a pair" means
(==) `on` fst
In other words, equality on the first element of a pair.
So to do the grouping, we supply that requirement to groupBy, like so:
groupBy ((==) `on` fst) xs
This will get us back, in your example, the two groups:
[[("a","b"),("a","c")]
,[("b","e")]]
Step 2
Now what remains is turning those lists into pairs. The basic principle behind that is, if we let
ys = [("a","b"),("a","c")]
as an example, to take the first element of the first pair, and then just smash the second element of all pairs together into a list. Taking the first element of the first pair is easy!
fst (head ys) == "a"
Taking all the second elements is fairly easy as well!
map snd ys == ["b", "c"]
Both of these operations together give us what we want.
(fst (head ys), map snd ys) == ("a", ["b", "c"])
Finished product
So if you want to, you can write your clumping function as
clump xs = (fst (head ys), map snd ys)
where ys = groupBy ((==) `on` fst) xs
I'm having a hard time when I try to order a list of lists by the second element, something like this
list = [[_,B,_,_,_],[_,A,_,_,_],[_,C,_,_,_]]
into this:
list = [[_,A,_,_,_],[_,B,_,_,_],[_,C,_,_,_]]
I've tried:
sortBy compare $ [([1,2]!!1),([2,3]!!1)]
But it filters the seconds elements and order that into [2,3].
What you tried to do is sort the list [([1,2]!!1),([2,3]!!1)], which is equivalent to [2, 3], by compare. What you want to do is use sortBy with a function that first gets the second element and then compares:
sortBySecond = sortBy (\ a b -> compare (a !! 1) (b !! 1))
Then take the list you have and apply this function to it:
sortBySecond [[1, 2], [2, 3]]
You can make this function neater by using on from Data.Function:
import Data.Function
sortBySecond = sortBy (compare `on` (!! 1))
You can also use comparing from Data.Ord:
sortBySecond = sortBy $ comparing (!! 1)
another idea I came up with would be to simply start sorting at the second element of a list by using tail. I also tried to write it point free - just as an exercise for me.
pfsortBySnd :: (Ord a) => [[a]] -> [[a]]
pfsortBySnd = sortBy second
where second = comparing tail
sortBySnd :: (Ord a) => [[a]] -> [[a]]
sortBySnd xx = sortBy second xx
where second x y = compare (tail x) (tail y)