I have following problem:
You are given matrix m*n and you have to find maximal positive ( all elements of submatrix should be > 0) submatrices from (1,1) to (x,y).
What do I mean by maximal is, when you have following matrix:
[[1,2,3,4],[5,6,7,8],[9,10,-11,12],[13,14,15,16]]
then maximal positive submatrices are:
[[[1,2,3,4],[5,6,7,8]],[[1,2],[5,6],[9,10],[13,14]]]
i.e. first two rows is one solution and first two columns is second solution.
Another example: matrix is
[[1,2,3,-4],[5,6,7,8],[-9,10,-11,12],[13,14,15,16]]
and solution is:
[[[1,2,3],[5,6,7]]]
This is my Haskell program which solves it:
import Data.List hiding (insert)
import qualified Data.Set as Set
unique :: Ord a => [a] -> [a]
unique = Set.toList . Set.fromList
subList::[[Int]] ->[[[Int]]]
subList matrix = filter (allPositiveMatrix) $ [ (submatrix matrix 1 1 x y) | x<-[1..width(matrix)], y<-[1..height(matrix)]]
maxWidthMat::[[[Int]]] -> Int
maxWidthMat subList =length ((foldl (\largestPreviousX nextMatrix -> if (length (nextMatrix!!0)) >(length (largestPreviousX !!0)) then nextMatrix else largestPreviousX ) [[]] subList)!!0)
maxWidthSubmatrices:: [[[Int]]] -> Int ->[[[Int]]]
maxWidthSubmatrices subList maxWidth = filter (\x -> (length $x!!0)==maxWidth) subList
height matrix = length matrix
width matrix = length (matrix!!0)
maximalPositiveSubmatrices matrix = maxWidthSubmatrices (subList matrix) (maxWidthMat (filter (\x -> (length $x!!0)==( maxWidthMat $ subList matrix )) (subList matrix)))
allPositiveList list = foldl (\x y -> if (y>0)&&(x==True) then True else False) True list
allPositiveMatrix:: [[Int]] -> Bool
allPositiveMatrix matrix = foldl (\ x y -> if (allPositiveList y)&&(x==True) then True else False ) True matrix
submatrix matrix x1 y1 x2 y2 = slice ( map (\x -> slice x x1 x2) matrix) y1 y2
slice list x y = drop (x-1) (take y list)
maximalWidthSubmatrix mm = maximum $ maximalPositiveSubmatrices mm
maximalHeigthSubmatrix mm = transpose $ maximum $ maximalPositiveSubmatrices $ transpose mm
-- solution
solution matrix =unique $ [maximalWidthSubmatrix matrix]++[maximalHeigthSubmatrix matrix]
As you can see it's extremely lengthy and ugly.
It problably isn't fastest too.
Could you show me more elegant, faster and shorter solution ( possibly with explantions) ?
Proposed algorithm
I think that in order to solve the problem, we first better perform a dimension reduction:
reduce_dim :: (Num a,Ord a) => [[a]] -> [Int]
reduce_dim = map (length . takeWhile (>0)) -- O(m*n)
Here for every row, we calculate the number of items - starting from the left - that are positive. So for the given matrix:
1 2 3 4 | 4
5 6 7 8 | 4
9 10 -11 12 | 2
13 14 15 16 | 4
The second row thus maps to 2, since the third element is -11.
Or for your other matrix:
1 2 3 -4 | 3
5 6 7 8 | 4
-9 10 -11 12 | 0
13 14 15 16 | 4
Since the first row has a -4 at column 4, and the third one at column 1.
Now we can obtain a scanl1 min over these rows:
Prelude> scanl1 min [4,4,2,4] -- O(m)
[4,4,2,2]
Prelude> scanl1 min [3,4,0,4] -- O(m)
[3,3,0,0]
Now each time the number decreases (and at the end), we know we have found a maximal submatrix at the row above. Since that means we now work with a row from where on, the number of columns is less. Once we reach zero, we know that further evaluation has no sense, since we are working with a matrix with 0 columns.
So based on that list, we can simply generate a list of tuples of the sizes of the maximal submatrices:
max_sub_dim :: [Int] -> [(Int,Int)]
max_sub_dim = msd 1 -- O(m)
where msd r [] = []
msd r (0:_) = []
msd r [c] = [(r,c)]
msd r (c1:cs#(c2:_)) | c2 < c1 = (r,c1) : msd (r+1) cs
| otherwise = msd (r+1) cs
So for your two matrices, we obtain:
*Main> max_sub_dim $ scanl1 min $ reduce_dim [[1,2,3,4],[5,6,7,8],[9,10,-11,12],[13,14,15,16]]
[(2,4),(4,2)]
*Main> max_sub_dim $ scanl1 min $ reduce_dim [[1,2,3,-4],[5,6,7,8],[-9,10,-11,12],[13,14,15,16]]
[(2,3)]
Now we only need to obtain these submatrices themselves. We can do this by using take and a map over take:
construct_sub :: [[a]] -> [(Int,Int)] -> [[[a]]]
construct_sub mat = map (\(r,c) -> take r (map (take c) mat)) -- O(m^2*n)
And now we only need to link it all together in a solve:
-- complete program
reduce_dim :: (Num a,Ord a) => [[a]] -> [Int]
reduce_dim = map (length . takeWhile (>0))
max_sub_dim :: [Int] -> [(Int,Int)]
max_sub_dim = msd 1
where msd r [] = []
msd r (0:_) = []
msd r [c] = [(r,c)]
msd r (c1:cs#(c2:_)) | c2 < c1 = (r,c1) : msd (r+1) cs
| otherwise = msd (r+1) cs
construct_sub :: [[a]] -> [(Int,Int)] -> [[[a]]]
construct_sub mat = map (\(r,c) -> take r (map (take c) mat))
solve :: (Num a,Ord a) => [[a]] -> [[[a]]]
solve mat = construct_sub mat $ max_sub_dim $ scanl1 min $ reduce_dim mat
Which then generates:
*Main> solve [[1,2,3,4],[5,6,7,8],[9,10,-11,12],[13,14,15,16]]
[[[1,2,3,4],[5,6,7,8]],[[1,2],[5,6],[9,10],[13,14]]]
*Main> solve [[1,2,3,-4],[5,6,7,8],[-9,10,-11,12],[13,14,15,16]]
[[[1,2,3],[5,6,7]]]
Time complexity
The algorithm runs in O(m×n) with m the number of rows and n the number of columns, to construct the dimensions of the matrices. For every defined function, I wrote the time complexity in comment.
It will take O(m2×n) to construct all submatrices. So the algorithm runs in O(m2×n).
We can transpose the approach and run on columns instead of rows. So in case we are working with matrices where the number of rows differs greatly from the number of columns, we can first calculate the minimum, optionally transpose, and thus make m the smallest of the two.
Point of potential optimization
we can make the algorithm faster by constructing submatrices while constructing max_sub_dim saving some work.
Related
I am learning Haskell and am currently creating a program that finds all common divisors from 3 different Int:s.
I have a working program but the evaluation time is very long on big numbers. I want advice on how to optimize it.
EXAMPLE: combineDivisors 234944 246744 144456 == [1,2,4,8]
As said I am very new to this so any help is appreciated.
import Data.List
combineDivisors :: Int -> Int -> Int -> [Int]
combineDivisors n1 n2 n3 =
mergeSort list
where list = getTrips concList
concList = isDivisor n1 ++ isDivisor n2 ++ isDivisor n3
isDivisor n = [x | x <- [1..n], mod n x == 0]
getTriplets :: Ord a => [a] -> [a]
getTriplets = map head . filter (\l -> length l > 2) . group . sort
--Merge sort--
split :: [a] -> ([a],[a])
split xs =
let
l = length xs `div` 2
in
(take l xs, drop l xs)
merge :: [Int] -> [Int] -> [Int]
merge [] ys = ys
merge xs [] = xs
merge (x:xs) (y:ys)
| y < x = y : merge (x:xs) ys
| otherwise = x : merge xs (y:ys)
mergeSort :: [Int] -> [Int]
mergeSort [] = []
mergeSort [x] = [x]
mergeSort xs =
let
(xs1,xs2) = split xs
in
merge (mergeSort xs1) (mergeSort xs2)
If you don't care too much about memory usage, you can just use Data.IntSet and a function to find all factors given a number to do this.
First, let's make a function that returns an IntSet of all factors of a number-
import qualified Data.IntSet as IntSet
factors :: Int -> IntSet.IntSet
factors n = IntSet.fromList . f $ 1 -- Convert the list of factors into a set
where
-- Actual function that returns the list of factors
f :: Int -> [Int]
f i
-- Exit when i has surpassed square root of n
| i * i > n = []
| otherwise = if n `mod` i == 0
-- n is divisible by i - add i and n / i to the list
then i : n `div` i : f (i + 1)
-- n is not divisible by i - continue to the next
else f (i + 1)
Now, once you have the IntSet corresponding to each number, you just have to do a intersection on them to get the result
commonFactors :: Int -> Int -> Int -> [Int]
commonFactors n1 n2 n3 = IntSet.toList $ IntSet.intersection (factors n3) $ IntSet.intersection (factors n1) $ factors n2
That works but is a bit ugly. How about making an intersections function that can take multiple IntSets and produce a final intersection result.
intersections :: [IntSet.IntSet] -> IntSet.IntSet
intersections [] = IntSet.empty
intersections (t:ts) = foldl IntSet.intersection t ts
That should fold on a list of IntSets to find the final intersection
Now you can refactor commonFactors to-
commonFactors :: Int -> Int -> Int -> [Int]
commonFactors n1 n2 n3 = IntSet.toList . intersections $ [factors n1, factors n2, factors n3]
Better? I'd think so. How about one last improvement, a general commonFactors function for n amount of ints
commonFactors :: [Int] -> [Int]
commonFactors = IntSet.toList . intersections . map factors
Note that this is using an IntSet, so it is naturally limited to Ints. If you want to use Integer instead - just replace IntSet with a regular Set Integer
Output
> commonFactors [234944, 246744, 144456]
[1,2,4,8]
You should use the standard algorithm where you prime factorize their GCD:
import Data.List
import qualified Data.Map.Strict as M
-- infinite list of primes
primes :: [Integer]
primes = 2:3:filter
(\n -> not $ any
(\p -> n `mod` p == 0)
(takeWhile (\p -> p * p <= n) primes))
[5,7..]
-- prime factorizing a number
primeFactorize :: Integer -> [Integer]
primeFactorize n
| n <= 1 = []
-- we search up to the square root to find a prime factor
-- if we find one then add it to the list, divide and recurse
| Just p <- find
(\p -> n `mod` p == 0)
(takeWhile (\p -> p * p <= n) primes) = p:primeFactorize (n `div` p)
-- if we don't then the number has to be prime so we're done
| otherwise = [n]
-- count the number of each element in a list
-- e.g.
-- getCounts [1, 2, 2, 3, 4] == fromList [(1, 1), (2, 2), (3, 1), (4, 1)]
getCounts :: (Ord a) => [a] -> M.Map a Int
getCounts [] = M.empty
getCounts (x:xs) = M.insertWith (const (+1)) x 1 m
where m = getCounts xs
-- get all possible combinations from a map of counts
-- e.g. getCombos (M.fromList [('a', 2), ('b', 1), ('c', 2)])
-- == ["","c","cc","b","bc","bcc","a","ac","acc","ab","abc","abcc","aa","aac","aacc","aab","aabc","aabcc"]
getCombos :: M.Map a Int -> [[a]]
getCombos m = allFactors
where
list = M.toList m
factors = fst <$> list
counts = snd <$> list
possible = (\n -> [0..n]) <$> counts
allCounts = sequence possible
allFactors = (\count -> concat $ zipWith replicate count factors) <$> allCounts
-- get the common factors of a list of numbers
commonFactorsList :: [Integer] -> [Integer]
commonFactorsList [] = []
commonFactorsList l = sort factors
where
totalGcd = foldl1 gcd l
-- then get the combinations them and take their products to get the factor
factors = map product . getCombos . getCounts . primeFactorize $ totalGcd
-- helper function for 3 numbers
commonFactors3 :: Integer -> Integer -> Integer -> [Integer]
commonFactors3 a b c = commonFactorsList [a, b, c]
I'm interested in writing an efficient Haskell function triangularize :: [a] -> [[a]] that takes a (perhaps infinite) list and "triangularizes" it into a list of lists. For example, triangularize [1..19] should return
[[1, 3, 6, 10, 15]
,[2, 5, 9, 14]
,[4, 8, 13, 19]
,[7, 12, 18]
,[11, 17]
,[16]]
By efficient, I mean that I want it to run in O(n) time where n is the length of the list.
Note that this is quite easy to do in a language like Python, because appending to the end of a list (array) is a constant time operation. A very imperative Python function which accomplishes this is:
def triangularize(elements):
row_index = 0
column_index = 0
diagonal_array = []
for a in elements:
if row_index == len(diagonal_array):
diagonal_array.append([a])
else:
diagonal_array[row_index].append(a)
if row_index == 0:
(row_index, column_index) = (column_index + 1, 0)
else:
row_index -= 1
column_index += 1
return diagonal_array
This came up because I have been using Haskell to write some "tabl" sequences in the On-Line Encyclopedia of Integer Sequences (OEIS), and I want to be able to transform an ordinary (1-dimensional) sequence into a (2-dimensional) sequence of sequences in exactly this way.
Perhaps there's some clever (or not-so-clever) way to foldr over the input list, but I haven't been able to sort it out.
Make increasing size chunks:
chunks :: [a] -> [[a]]
chunks = go 0 where
go n [] = []
go n as = b : go (n+1) e where (b,e) = splitAt n as
Then just transpose twice:
diagonalize :: [a] -> [[a]]
diagonalize = transpose . transpose . chunks
Try it in ghci:
> diagonalize [1..19]
[[1,3,6,10,15],[2,5,9,14],[4,8,13,19],[7,12,18],[11,17],[16]]
This appears to be directly related to the set theory argument proving that the set of integer pairs are in one-to-one correspondence with the set of integers (denumerable). The argument involves a so-called Cantor pairing function.
So, out of curiosity, let's see if we can get a diagonalize function that way.
Define the infinite list of Cantor pairs recursively in Haskell:
auxCantorPairList :: (Integer, Integer) -> [(Integer, Integer)]
auxCantorPairList (x,y) =
let nextPair = if (x > 0) then (x-1,y+1) else (x+y+1, 0)
in (x,y) : auxCantorPairList nextPair
cantorPairList :: [(Integer, Integer)]
cantorPairList = auxCantorPairList (0,0)
And try that inside ghci:
λ> take 15 cantorPairList
[(0,0),(1,0),(0,1),(2,0),(1,1),(0,2),(3,0),(2,1),(1,2),(0,3),(4,0),(3,1),(2,2),(1,3),(0,4)]
λ>
We can number the pairs, and for example extract the numbers for those pairs which have a zero x coordinate:
λ>
λ> xs = [1..]
λ> take 5 $ map fst $ filter (\(n,(x,y)) -> (x==0)) $ zip xs cantorPairList
[1,3,6,10,15]
λ>
We recognize this is the top row from the OP's result in the text of the question.
Similarly for the next two rows:
λ>
λ> makeRow xs row = map fst $ filter (\(n,(x,y)) -> (x==row)) $ zip xs cantorPairList
λ> take 5 $ makeRow xs 1
[2,5,9,14,20]
λ>
λ> take 5 $ makeRow xs 2
[4,8,13,19,26]
λ>
From there, we can write our first draft of a diagonalize function:
λ>
λ> printAsLines xs = mapM_ (putStrLn . show) xs
λ> diagonalize xs = takeWhile (not . null) $ map (makeRow xs) [0..]
λ>
λ> printAsLines $ diagonalize [1..19]
[1,3,6,10,15]
[2,5,9,14]
[4,8,13,19]
[7,12,18]
[11,17]
[16]
λ>
EDIT: performance update
For a list of 1 million items, the runtime is 18 sec, and 145 seconds for 4 millions items. As mentioned by Redu, this seems like O(n√n) complexity.
Distributing the pairs among the various target sublists is inefficient, as most filter operations fail.
To improve performance, we can use a Data.Map structure for the target sublists.
{-# LANGUAGE ExplicitForAll #-}
{-# LANGUAGE ScopedTypeVariables #-}
import qualified Data.List as L
import qualified Data.Map as M
type MIL a = M.Map Integer [a]
buildCantorMap :: forall a. [a] -> MIL a
buildCantorMap xs =
let ts = zip xs cantorPairList -- triplets (a,(x,y))
m0 = (M.fromList [])::MIL a
redOp m (n,(x,y)) = let afn as = case as of
Nothing -> Just [n]
Just jas -> Just (n:jas)
in M.alter afn x m
m1r = L.foldl' redOp m0 ts
in
fmap reverse m1r
diagonalize :: [a] -> [[a]]
diagonalize xs = let cm = buildCantorMap xs
in map snd $ M.toAscList cm
With that second version, performance appears to be much better: 568 msec for the 1 million items list, 2669 msec for the 4 millions item list. So it is close to the O(n*Log(n)) complexity we could have hoped for.
It might be a good idea to craete a comb filter.
So what does comb filter do..? It's like splitAt but instead of splitting at a single index it sort of zips the given infinite list with the given comb to separate the items coressponding to True and False in the comb. Such that;
comb :: [Bool] -- yields [True,False,True,False,False,True,False,False,False,True...]
comb = iterate (False:) [True] >>= id
combWith :: [Bool] -> [a] -> ([a],[a])
combWith _ [] = ([],[])
combWith (c:cs) (x:xs) = let (f,s) = combWith cs xs
in if c then (x:f,s) else (f,x:s)
λ> combWith comb [1..19]
([1,3,6,10,15],[2,4,5,7,8,9,11,12,13,14,16,17,18,19])
Now all we need to do is to comb our infinite list and take the fst as the first row and carry on combing the snd with the same comb.
Lets do it;
diags :: [a] -> [[a]]
diags [] = []
diags xs = let (h,t) = combWith comb xs
in h : diags t
λ> diags [1..19]
[ [1,3,6,10,15]
, [2,5,9,14]
, [4,8,13,19]
, [7,12,18]
, [11,17]
, [16]
]
also seems to be lazy too :)
λ> take 5 . map (take 5) $ diags [1..]
[ [1,3,6,10,15]
, [2,5,9,14,20]
, [4,8,13,19,26]
, [7,12,18,25,33]
, [11,17,24,32,41]
]
I think the complexity could be like O(n√n) but i can not make sure. Any ideas..?
I'm a Haskell beginner trying to learn more about the language by solving some online quizzes/problem sets.
The problem/question is quite lengthy but a part of it requires code that can find the number which divides a given list into two (nearly) equal (by sum) sub-lists.
Given [1..10]
Answer should be 7 since 1+2+..7 = 28 & 8+9+10 = 27
This is the way I implemented it
-- partitions list by y
partishner :: (Floating a) => Int -> [a] -> [[[a]]]
partishner 0 xs = [[xs],[]]
partishner y xs = [take y xs : [drop y xs]] ++ partishner (y - 1) xs
-- finds the equal sum
findTheEquilizer :: (Ord a, Floating a) => [a] -> [[a]]
findTheEquilizer xs = fst $ minimumBy (comparing snd) zipParty
where party = (tail . init) (partishner (length xs) xs) -- removes [xs,[]] types
afterParty = (map (\[x, y] -> (x - y) ** 2) . init . map (map sum)) party
zipParty = zip party afterParty -- zips partitions and squared diff betn their sums
Given (last . head) (findTheEquilizer [1..10])
output : 7
For numbers near 50k it works fine
λ> (last . head) (findTheEquilizer [1..10000])
7071.0
The trouble starts when I put in lists with any more than 70k elements in it. It takes forever to compute.
So what do I have to change in the code to make it run better or do I have to change my whole approach? I'm guessing it's the later, but I'm not sure how to go about do that.
It looks to me that the implementation is quite chaotic. For example partishner seems to construct a list of lists of lists of a, where, given I understood it correctly, the outer list contains lists with each two elements: the list of elements on "the left", and the list of elements at the "right". As a result, this will take O(n2) to construct the lists.
By using lists over 2-tuples, this is also quite "unsafe", since a list can - although here probably impossible - contain no elements, one element, or more than two elements. If you make a mistake in one of the functions, it will be hard to find out that mistake.
It looks to me that it might be easier to implement a "sweep algorithm": we first calculate the sum of all the elements in the list. This is the value on the "right" in case we decide to split at that specific point, next we start moving from left to right, each time subtracting the element from the sum on the right, and adding it to the sum on the left. We can each time evaluate the difference in score, like:
import Data.List(unfoldr)
sweep :: Num a => [a] -> [(Int, a, [a])]
sweep lst = x0 : unfoldr f x0
where x0 = (0, sum lst, lst)
f (_, _, []) = Nothing
f (i, r, (x: xs)) = Just (l, l)
where l = (i+1, r-2*x, xs)
For example:
Prelude Data.List> sweep [1,4,2,5]
[(0,12,[1,4,2,5]),(1,10,[4,2,5]),(2,2,[2,5]),(3,-2,[5]),(4,-12,[])]
So if we select to split at the first split point (before the first element), the sum on the right is 12 higher than the sum on the left, if we split after the first element, the sum on the right (11) is 10 higher than the sum on the left (1).
We can then obtain the minimum of these splits with minimumBy :: (a -> a -> Ordering) -> [a] -> a:
import Data.List(minimumBy)
import Data.Ord(comparing)
findTheEquilizer :: (Ord a, Num a) => [a] -> ([a], [a])
findTheEquilizer lst = (take idx lst, tl)
where (idx, _, tl) = minimumBy (comparing (abs . \(_, x, _) -> x)) (sweep lst)
We then obtain the correct value for [1..10]:
Prelude Data.List Data.Ord Data.List> findTheEquilizer [1..10]
([1,2,3,4,5,6,7],[8,9,10])
or for 70'000:
Prelude Data.List Data.Ord Data.List> head (snd (findTheEquilizer [1..70000]))
49498
The above is not ideal, it can be implemented more elegantly, but I leave this as an exercise.
Okay, firstly, let analyse why it run forever (...actually not forever, just slow), take a look of partishner function:
partishner y xs = [take y xs : [drop y xs]] ++ partishner (y - 1) xs
where take y xs and drop y xs are run linear time, i.e. O(N), and so as
[take y xs : [drop y xs]]
is O(N) too.
However, it is run again and again in recursive way over each element of given list. Now suppose the length of given list is M, each call of partishner function take O(N) times, to finish computation need:
O(1+2+...M) = (M(1+M)/2) ~ O(M^2)
Now, the list has 70k elements, it at least need 70k ^ 2 step. So why it hang.
Instead of using partishner function, you can sum the list in linear way as:
sumList::(Floating a)=>[a]->[a]
sumList xs = sum 0 xs
where sum _ [] = []
sum s (y:ys) = let s' = s + y in s' : sum s' ys
and findEqilizer just sum the given list from left to right (leftSum) and from right to left (rightSum) and take the result just as your original program, but the whole process just take linear time.
findEquilizer::(Ord a, Floating a) => [a] -> a
findEquilizer [] = 0
findEquilizer xs =
let leftSum = reverse $ 0:(sumList $ init xs)
rightSum = sumList $ reverse $ xs
afterParty = zipWith (\x y->(x-y) ** 2) leftSum rightSum
in fst $ minimumBy (comparing snd) (zip (reverse $ init xs) afterParty)
I assume that none of the list elements are negative, and use a "tortoise and hare" approach. The hare steps through the list, adding up elements. The tortoise does the same thing, but it keeps its sum doubled and it carefully ensures that it only takes a step when that step won't put it ahead of the hare.
approxEqualSums
:: (Num a, Ord a)
=> [a] -> (Maybe a, [a])
approxEqualSums as0 = stepHare 0 Nothing as0 0 as0
where
-- ht is the current best guess.
stepHare _tortoiseSum ht tortoise _hareSum []
= (ht, tortoise)
stepHare tortoiseSum ht tortoise hareSum (h:hs)
= stepTortoise tortoiseSum ht tortoise (hareSum + h) hs
stepTortoise tortoiseSum ht [] hareSum hare
= stepHare tortoiseSum ht [] hareSum hare
stepTortoise tortoiseSum ht tortoise#(t:ts) hareSum hare
| tortoiseSum' <= hareSum
= stepTortoise tortoiseSum' (Just t) ts hareSum hare
| otherwise
= stepHare tortoiseSum ht tortoise hareSum hare
where tortoiseSum' = tortoiseSum + 2*t
In use:
> approxEqualSums [1..10]
(Just 6,[7,8,9,10])
6 is the last element before going over half, and 7 is the first one after that.
I asked in the comment and OP says [1..n] is not really defining the question. Yes i guess what's asked is like [1 -> n] in random ascending sequence such as [1,3,7,19,37,...,1453,...,n].
Yet..! Even as per the given answers, for a list like [1..n] we really don't need to do any list operation at all.
The sum of [1..n] is n*(n+1)/2.
Which means we need to find m for n*(n+1)/4
Which means m(m+1)/2 = n*(n+1)/4.
So if n == 100 then m^2 + m - 5050 = 0
All we need is
formula where a = 1, b = 1 and c = -5050 yielding the reasonable root to be 70.565 ⇒ 71 (rounded). Lets check. 71*72/2 = 2556 and 5050-2556 = 2494 which says 2556 - 2494 = 62 minimal difference (<71). Yes we must split at 71. So just do like result = [[1..71],[72..100]] over..!
But when it comes to not subsequent ascending, that's a different animal. It has to be done by first finding the sum and then like binary search by jumping halfway the list and comparing the sums to decide whether to jump halfway back or forward accordingly. I will implement that one later.
Here's a code which is empirically behaving better than linear, and gets to the 2,000,000 in just over 1 second even when interpreted:
g :: (Ord c, Num c) => [c] -> [(Int, c)]
g = head . dropWhile ((> 0) . snd . last) . map (take 2) . tails . zip [1..]
. (\xs -> zipWith (-) (map (last xs -) xs) xs) . scanl1 (+)
g [1..10] ==> [(6,13),(7,-1)] -- 0.0s
g [1..70000] ==> [(49497,32494),(49498,-66502)] -- 0.09s
g [70000,70000-1..1] ==> [(20502,66502),(20503,-32494)] -- 0.09s
g [1..100000] ==> [(70710,75190),(70711,-66232)] -- 0.11s
g [1..1000000] ==> [(707106,897658),(707107,-516556)] -- 0.62s
g [1..2000000] ==> [(1414213,1176418),(1414214,-1652010)] -- 1.14s n^0.88
g [1..3000000] ==> [(2121320,836280),(2121321,-3406362)] -- 1.65s n^0.91
It works by running the partial sums with scanl1 (+) and taking the total sum as its last, so that for each partial sum, subtracting it from the total gives us the sum of the second part of the split.
The algorithm assumes all the numbers in the input list are strictly positive, so the partial sums list is monotonically increasing. Nothing else is assumed about the numbers.
The value must be chosen from the pair (the g's result) so that its second component's absolute value is the smaller between the two.
This is achieved by minimumBy (comparing (abs . snd)) . g.
clarifications: There's some confusion about "complexity" in the comments below, yet the answer says nothing at all about complexity but uses a specific empirical measurement. You can't argue with empirical data (unless you misinterpret its meaning).
The answer does not claim it "is better than linear", it says "it behaves better than linear" [in the tested range of problem sizes], which the empirical data incontrovertibly show.
Finally, an appeal to authority. Robert Sedgewick is an authority on algorithms. Take it up with him.
(and of course the algorithm handles unordered data as well as it does ordered).
As for the reasons for OP's code inefficiency: map sum . inits can't help being quadratic, but the equivalent scanl (+) 0 is linear. The radical improvement comes about from a lot of redundant calculations in the former being avoided in the latter. (Another example of this can be seen here.)
This is for a class
We're supposed to write 3 functions :
1 : Prints list of fibbonaci numbers
2 : Prints list of prime numbers
3 : Prints list of fibonacci numbers whose indexes are prime
EG : Let this be fibbonaci series
Then In partC - certain elements are only shown
1: 1
*2: 1 (shown as index 2 is prime )
*3: 2 (shown as index 3 is prime )
4: 3
*5: 5 (shown )
6: 8
*7: 13 (shown as index 7 prime and so on)
I'm done with part 1 & 2 but I'm struggling with part 3. I created a function listNum that creates a sort of mapping [Integer, Integer] from the Fibbonaci series - where 1st Int is the index and 2nd int is the actual fibbonaci numbers.
Now my function partC is trying to stitch snd elements of the fibonaci series by filtering the indexes but I'm doing something wrong in the filter step.
Any help would be appreciated as I'm a beginner to Haskell.
Thanks!
fib :: [Integer]
fib = 0 : 1 : zipWith (+) fib (tail fib)
listNum :: [(Integer, Integer)]
listNum = zip [1 .. ] fib
primes :: [Integer]
primes = sieve (2 : [3,5 ..])
where
sieve (p:xs) = p : sieve [x | x <- xs , x `mod` p > 0]
partC :: [Integer] -- Problem in filter part of this function
partC = map snd listNum $ filter (\x -> x `elem` primes) [1,2 ..]
main = do
print (take 10 fib) -- Works fine
print (take 10 primes) --works fine
print (take 10 listNum) --works fine
print ( take 10 partC) -- Causes error
Error :
prog0.hs:14:9: error:
• Couldn't match expected type ‘[Integer] -> [Integer]’
with actual type ‘[Integer]’
• The first argument of ($) takes one argument,
but its type ‘[Integer]’ has none
In the expression:
map snd listNum $ filter (\ x -> x `elem` primes) [1, 2 .. ]
In an equation for ‘partC’:
partC
= map snd listNum $ filter (\ x -> x `elem` primes) [1, 2 .. ]
|
14 | partC = map snd listNum $ filter (\x -> x `elem` primes) [1,2 ..]
Here's what I think you intended as the original logic of partC. You got the syntax mostly right, but the logic has a flaw.
partC = snd <$> filter ((`elem` primes) . fst) (zip [1..] fib)
-- note that (<$>) = fmap = map, just infix
-- list comprehension
partC = [fn | (idx, fn) <- zip [1..] fib, idx `elem` primes]
But this cannot work. As #DanRobertson notes, you'll try to check 4 `elem` primes and run into an infinite loop, because primes is infinite and elem tries to be really sure that 4 isn't an element before giving up. We humans know that 4 isn't an element of primes, but elem doesn't.
There are two ways out. We can write a custom version of elem that gives up once it finds an element larger than the one we're looking for:
sortedElem :: Ord a => a -> [a] -> Bool
sortedElem x (h:tl) = case x `compare` h of
LT -> False
EQ -> True
GT -> sortedElem x tl
sortedElem _ [] = False
-- or
sortedElem x = foldr (\h tl -> case x `compare` h of
LT -> False
EQ -> True
GT -> tl
) False
Since primes is a sorted list, sortedElem will always give the correct answer now:
partC = snd <$> filter ((`sortedElem` primes) . fst) (zip [1..] fib)
However, there is a performance issue, because every call to sortedElem has to start at the very beginning of primes and walk all the way down until it figures out whether or not the index is right. This leads into the second way:
partC = go primeDiffs fib
where primeDiffs = zipWith (-) primes (1:primes)
-- primeDiffs = [1, 1, 2, 2, 4, 2, 4, 2, 4, 6, ...]
-- The distance from one prime (incl. 1) to the next
go (step:steps) xs = x:go steps xs'
where xs'#(x:_) = drop step xs
go [] _ = [] -- unused here
-- in real code you might pull this out into an atOrderedIndices :: [Int] -> [a] -> [a]
We transform the list of indices (primes) into a list of offsets, each one building on the next, and we call it primeDiffs. We then define go to take such a list of offsets and extract elements from another list. It first drops the elements being skipped, and then puts the top element into the result before building the rest of the list. Under -O2, on my machine, this version is twice as fast as the other one when finding partC !! 5000.
What's the most direct/efficient way to create all possibilities of dividing one (even) list into two in Haskell? I toyed with splitting all permutations of the list but that would add many extras - all the instances where each half contains the same elements, just in a different order. For example,
[1,2,3,4] should produce something like:
[ [1,2], [3,4] ]
[ [1,3], [2,4] ]
[ [1,4], [2,3] ]
Edit: thank you for your comments -- the order of elements and the type of the result is less important to me than the concept - an expression of all two-groups from one group, where element order is unimportant.
Here's an implementation, closely following the definition.
The first element always goes into the left group. After that, we add the next head element into one, or the other group. If one of the groups becomes too big, there is no choice anymore and we must add all the rest into the the shorter group.
divide :: [a] -> [([a], [a])]
divide [] = [([],[])]
divide (x:xs) = go ([x],[], xs, 1,length xs) []
where
go (a,b, [], i,j) zs = (a,b) : zs -- i == lengh a - length b
go (a,b, s#(x:xs), i,j) zs -- j == length s
| i >= j = (a,b++s) : zs
| (-i) >= j = (a++s,b) : zs
| otherwise = go (x:a, b, xs, i+1, j-1) $ go (a, x:b, xs, i-1, j-1) zs
This produces
*Main> divide [1,2,3,4]
[([2,1],[3,4]),([3,1],[2,4]),([1,4],[3,2])]
The limitation of having an even length list is unnecessary:
*Main> divide [1,2,3]
[([2,1],[3]),([3,1],[2]),([1],[3,2])]
(the code was re-written in the "difference-list" style for efficiency: go2 A zs == go1 A ++ zs).
edit: How does this work? Imagine yourself sitting at a pile of stones, dividing it into two. You put the first stone to a side, which one it doesn't matter (so, left, say). Then there's a choice where to put each next stone — unless one of the two piles becomes too small by comparison, and we thus must put all the remaining stones there at once.
To find all partitions of a non-empty list (of even length n) into two equal-sized parts, we can, to avoid repetitions, posit that the first element shall be in the first part. Then it remains to find all ways to split the tail of the list into one part of length n/2 - 1 and one of length n/2.
-- not to be exported
splitLen :: Int -> Int -> [a] -> [([a],[a])]
splitLen 0 _ xs = [([],xs)]
splitLen _ _ [] = error "Oops"
splitLen k l ys#(x:xs)
| k == l = [(ys,[])]
| otherwise = [(x:us,vs) | (us,vs) <- splitLen (k-1) (l-1) xs]
++ [(us,x:vs) | (us,vs) <- splitLen k (l-1) xs]
does that splitting if called appropriately. Then
partitions :: [a] -> [([a],[a])]
partitions [] = [([],[])]
partitions (x:xs)
| even len = error "Original list with odd length"
| otherwise = [(x:us,vs) | (us,vs) <- splitLen half len xs]
where
len = length xs
half = len `quot` 2
generates all the partitions without redundantly computing duplicates.
luqui raises a good point. I haven't taken into account the possibility that you'd want to split lists with repeated elements. With those, it gets a little more complicated, but not much. First, we group the list into equal elements (done here for an Ord constraint, for only Eq, that could still be done in O(length²)). The idea is then similar, to avoid repetitions, we posit that the first half contains more elements of the first group than the second (or, if there is an even number in the first group, equally many, and similar restrictions hold for the next group etc.).
repartitions :: Ord a => [a] -> [([a],[a])]
repartitions = map flatten2 . halves . prepare
where
flatten2 (u,v) = (flatten u, flatten v)
prepare :: Ord a => [a] -> [(a,Int)]
prepare = map (\xs -> (head xs, length xs)) . group . sort
halves :: [(a,Int)] -> [([(a,Int)],[(a,Int)])]
halves [] = [([],[])]
halves ((a,k):more)
| odd total = error "Odd number of elements"
| even k = [((a,low):us,(a,low):vs) | (us,vs) <- halves more] ++ [normalise ((a,c):us,(a,k-c):vs) | c <- [low + 1 .. min half k], (us,vs) <- choose (half-c) remaining more]
| otherwise = [normalise ((a,c):us,(a,k-c):vs) | c <- [low + 1 .. min half k], (us,vs) <- choose (half-c) remaining more]
where
remaining = sum $ map snd more
total = k + remaining
half = total `quot` 2
low = k `quot` 2
normalise (u,v) = (nz u, nz v)
nz = filter ((/= 0) . snd)
choose :: Int -> Int -> [(a,Int)] -> [([(a,Int)],[(a,Int)])]
choose 0 _ xs = [([],xs)]
choose _ _ [] = error "Oops"
choose need have ((a,k):more) = [((a,c):us,(a,k-c):vs) | c <- [least .. most], (us,vs) <- choose (need-c) (have-k) more]
where
least = max 0 (need + k - have)
most = min need k
flatten :: [(a,Int)] -> [a]
flatten xs = xs >>= uncurry (flip replicate)
Daniel Fischer's answer is a good way to solve the problem. I offer a worse (more inefficient) way, but one which more obviously (to me) corresponds to the problem description. I will generate all partitions of the list into two equal length sublists, then filter out equivalent ones according to your definition of equivalence. The way I usually solve problems is by starting like this -- create a solution that is as obvious as possible, then gradually transform it into a more efficient one (if necessary).
import Data.List (sort, nubBy, permutations)
type Partition a = ([a],[a])
-- Your notion of equivalence (sort to ignore the order)
equiv :: (Ord a) => Partition a -> Partition a -> Bool
equiv p q = canon p == canon q
where
canon (xs,ys) = sort [sort xs, sort ys]
-- All ordered partitions
partitions :: [a] -> [Partition a]
partitions xs = map (splitAt l) (permutations xs)
where
l = length xs `div` 2
-- All partitions filtered out by the equivalence
equivPartitions :: (Ord a) => [a] -> [Partition a]
equivPartitions = nubBy equiv . partitions
Testing
>>> equivPartitions [1,2,3,4]
[([1,2],[3,4]),([3,2],[1,4]),([3,1],[2,4])]
Note
After using QuickCheck to test the equivalence of this implementation with Daniel's, I found an important difference. Clearly, mine requires an (Ord a) constraint and his does not, and this hints at what the difference would be. In particular, if you give his [0,0,0,0], you will get a list with three copies of ([0,0],[0,0]), whereas mine will give only one copy. Which of these is correct was not specified; Daniel's is natural when considering the two output lists to be ordered sequences (which is what that type is usually considered to be), mine is natural when considering them as sets or bags (which is how this question seemed to be treating them).
Splitting The Difference
It is possible to get from an implementation that requires Ord to one that doesn't, by operating on the positions rather than the values in a list. I came up with this transformation -- an idea which I believe originates with Benjamin Pierce in his work on bidirectional programming.
import Data.Traversable
import Control.Monad.Trans.State
data Labelled a = Labelled { label :: Integer, value :: a }
instance Eq (Labelled a) where
a == b = compare a b == EQ
instance Ord (Labelled a) where
compare a b = compare (label a) (label b)
labels :: (Traversable t) => t a -> t (Labelled a)
labels t = evalState (traverse trav t) 0
where
trav x = state (\i -> i `seq` (Labelled i x, i + 1))
onIndices :: (Traversable t, Functor u)
=> (forall a. Ord a => t a -> u a)
-> forall b. t b -> u b
onIndices f = fmap value . f . labels
Using onIndices on equivPartitions wouldn't speed it up at all, but it would allow it to have the same semantics as Daniel's (up to equiv of the results) without the constraint, and with my more naive and obvious way of expressing it -- and I just thought it was an interesting way to get rid of the constraint.
My own generalized version, added much later, inspired by Will's answer:
import Data.Map (adjust, fromList, toList)
import Data.List (groupBy, sort)
divide xs n evenly = divide' xs (zip [0..] (replicate n [])) where
evenPSize = div (length xs) n
divide' [] result = [result]
divide' (x:xs) result = do
index <- indexes
divide' xs (toList $ adjust (x :) index (fromList result)) where
notEmptyBins = filter (not . null . snd) $ result
partlyFullBins | evenly == "evenly" = map fst . filter ((<evenPSize) . length . snd) $ notEmptyBins
| otherwise = map fst notEmptyBins
indexes = partlyFullBins
++ if any (null . snd) result
then map fst . take 1 . filter (null . snd) $ result
else if null partlyFullBins
then map fst. head . groupBy (\a b -> length (snd a) == length (snd b)) . sort $ result
else []