Related
I'm interested in writing an efficient Haskell function triangularize :: [a] -> [[a]] that takes a (perhaps infinite) list and "triangularizes" it into a list of lists. For example, triangularize [1..19] should return
[[1, 3, 6, 10, 15]
,[2, 5, 9, 14]
,[4, 8, 13, 19]
,[7, 12, 18]
,[11, 17]
,[16]]
By efficient, I mean that I want it to run in O(n) time where n is the length of the list.
Note that this is quite easy to do in a language like Python, because appending to the end of a list (array) is a constant time operation. A very imperative Python function which accomplishes this is:
def triangularize(elements):
row_index = 0
column_index = 0
diagonal_array = []
for a in elements:
if row_index == len(diagonal_array):
diagonal_array.append([a])
else:
diagonal_array[row_index].append(a)
if row_index == 0:
(row_index, column_index) = (column_index + 1, 0)
else:
row_index -= 1
column_index += 1
return diagonal_array
This came up because I have been using Haskell to write some "tabl" sequences in the On-Line Encyclopedia of Integer Sequences (OEIS), and I want to be able to transform an ordinary (1-dimensional) sequence into a (2-dimensional) sequence of sequences in exactly this way.
Perhaps there's some clever (or not-so-clever) way to foldr over the input list, but I haven't been able to sort it out.
Make increasing size chunks:
chunks :: [a] -> [[a]]
chunks = go 0 where
go n [] = []
go n as = b : go (n+1) e where (b,e) = splitAt n as
Then just transpose twice:
diagonalize :: [a] -> [[a]]
diagonalize = transpose . transpose . chunks
Try it in ghci:
> diagonalize [1..19]
[[1,3,6,10,15],[2,5,9,14],[4,8,13,19],[7,12,18],[11,17],[16]]
This appears to be directly related to the set theory argument proving that the set of integer pairs are in one-to-one correspondence with the set of integers (denumerable). The argument involves a so-called Cantor pairing function.
So, out of curiosity, let's see if we can get a diagonalize function that way.
Define the infinite list of Cantor pairs recursively in Haskell:
auxCantorPairList :: (Integer, Integer) -> [(Integer, Integer)]
auxCantorPairList (x,y) =
let nextPair = if (x > 0) then (x-1,y+1) else (x+y+1, 0)
in (x,y) : auxCantorPairList nextPair
cantorPairList :: [(Integer, Integer)]
cantorPairList = auxCantorPairList (0,0)
And try that inside ghci:
λ> take 15 cantorPairList
[(0,0),(1,0),(0,1),(2,0),(1,1),(0,2),(3,0),(2,1),(1,2),(0,3),(4,0),(3,1),(2,2),(1,3),(0,4)]
λ>
We can number the pairs, and for example extract the numbers for those pairs which have a zero x coordinate:
λ>
λ> xs = [1..]
λ> take 5 $ map fst $ filter (\(n,(x,y)) -> (x==0)) $ zip xs cantorPairList
[1,3,6,10,15]
λ>
We recognize this is the top row from the OP's result in the text of the question.
Similarly for the next two rows:
λ>
λ> makeRow xs row = map fst $ filter (\(n,(x,y)) -> (x==row)) $ zip xs cantorPairList
λ> take 5 $ makeRow xs 1
[2,5,9,14,20]
λ>
λ> take 5 $ makeRow xs 2
[4,8,13,19,26]
λ>
From there, we can write our first draft of a diagonalize function:
λ>
λ> printAsLines xs = mapM_ (putStrLn . show) xs
λ> diagonalize xs = takeWhile (not . null) $ map (makeRow xs) [0..]
λ>
λ> printAsLines $ diagonalize [1..19]
[1,3,6,10,15]
[2,5,9,14]
[4,8,13,19]
[7,12,18]
[11,17]
[16]
λ>
EDIT: performance update
For a list of 1 million items, the runtime is 18 sec, and 145 seconds for 4 millions items. As mentioned by Redu, this seems like O(n√n) complexity.
Distributing the pairs among the various target sublists is inefficient, as most filter operations fail.
To improve performance, we can use a Data.Map structure for the target sublists.
{-# LANGUAGE ExplicitForAll #-}
{-# LANGUAGE ScopedTypeVariables #-}
import qualified Data.List as L
import qualified Data.Map as M
type MIL a = M.Map Integer [a]
buildCantorMap :: forall a. [a] -> MIL a
buildCantorMap xs =
let ts = zip xs cantorPairList -- triplets (a,(x,y))
m0 = (M.fromList [])::MIL a
redOp m (n,(x,y)) = let afn as = case as of
Nothing -> Just [n]
Just jas -> Just (n:jas)
in M.alter afn x m
m1r = L.foldl' redOp m0 ts
in
fmap reverse m1r
diagonalize :: [a] -> [[a]]
diagonalize xs = let cm = buildCantorMap xs
in map snd $ M.toAscList cm
With that second version, performance appears to be much better: 568 msec for the 1 million items list, 2669 msec for the 4 millions item list. So it is close to the O(n*Log(n)) complexity we could have hoped for.
It might be a good idea to craete a comb filter.
So what does comb filter do..? It's like splitAt but instead of splitting at a single index it sort of zips the given infinite list with the given comb to separate the items coressponding to True and False in the comb. Such that;
comb :: [Bool] -- yields [True,False,True,False,False,True,False,False,False,True...]
comb = iterate (False:) [True] >>= id
combWith :: [Bool] -> [a] -> ([a],[a])
combWith _ [] = ([],[])
combWith (c:cs) (x:xs) = let (f,s) = combWith cs xs
in if c then (x:f,s) else (f,x:s)
λ> combWith comb [1..19]
([1,3,6,10,15],[2,4,5,7,8,9,11,12,13,14,16,17,18,19])
Now all we need to do is to comb our infinite list and take the fst as the first row and carry on combing the snd with the same comb.
Lets do it;
diags :: [a] -> [[a]]
diags [] = []
diags xs = let (h,t) = combWith comb xs
in h : diags t
λ> diags [1..19]
[ [1,3,6,10,15]
, [2,5,9,14]
, [4,8,13,19]
, [7,12,18]
, [11,17]
, [16]
]
also seems to be lazy too :)
λ> take 5 . map (take 5) $ diags [1..]
[ [1,3,6,10,15]
, [2,5,9,14,20]
, [4,8,13,19,26]
, [7,12,18,25,33]
, [11,17,24,32,41]
]
I think the complexity could be like O(n√n) but i can not make sure. Any ideas..?
Here is the expected input/output:
repeated "Mississippi" == "ips"
repeated [1,2,3,4,2,5,6,7,1] == [1,2]
repeated " " == " "
And here is my code so far:
repeated :: String -> String
repeated "" = ""
repeated x = group $ sort x
I know that the last part of the code doesn't work. I was thinking to sort the list then group it, then I wanted to make a filter on the list of list which are greater than 1, or something like that.
Your code already does half of the job
> group $ sort "Mississippi"
["M","iiii","pp","ssss"]
You said you want to filter out the non-duplicates. Let's define a predicate which identifies the lists having at least two elements:
atLeastTwo :: [a] -> Bool
atLeastTwo (_:_:_) = True
atLeastTwo _ = False
Using this:
> filter atLeastTwo . group $ sort "Mississippi"
["iiii","pp","ssss"]
Good. Now, we need to take only the first element from such lists. Since the lists are non-empty, we can use head safely:
> map head . filter atLeastTwo . group $ sort "Mississippi"
"ips"
Alternatively, we could replace the filter with filter (\xs -> length xs >= 2) but this would be less efficient.
Yet another option is to use a list comprehension
> [ x | (x:_y:_) <- group $ sort "Mississippi" ]
"ips"
This pattern matches on the lists starting with x and having at least another element _y, combining the filter with taking the head.
Okay, good start. One immediate problem is that the specification requires the function to work on lists of numbers, but you define it for strings. The list must be sorted, so its elements must have the typeclass Ord. Therefore, let’s fix the type signature:
repeated :: Ord a => [a] -> [a]
After calling sort and group, you will have a list of lists, [[a]]. Let’s take your idea of using filter. That works. Your predicate should, as you said, check the length of each list in the list, then compare that length to 1.
Filtering a list of lists gives you a subset, which is another list of lists, of type [[a]]. You need to flatten this list. What you want to do is map each entry in the list of lists to one of its elements. For example, the first. There’s a function in the Prelude to do that.
So, you might fill in the following skeleton:
module Repeated (repeated) where
import Data.List (group, sort)
repeated :: Ord a => [a] -> [a]
repeated = map _
. filter (\x -> _)
. group
. sort
I’ve written this in point-free style with the filtering predicate as a lambda expression, but many other ways to write this are equally good. Find one that you like! (For example, you could also write the filter predicate in point-free style, as a composition of two functions: a comparison on the result of length.)
When you try to compile this, the compiler will tell you that there are two typed holes, the _ entries to the right of the equal signs. It will also tell you the type of the holes. The first hole needs a function that takes a list and gives you back a single element. The second hole needs a Boolean expression using x. Fill these in correctly, and your program will work.
Here's some other approaches, to evaluate #chepner's comment on the solution using group $ sort. (Those solutions look simpler, because some of the complexity is hidden in the library routines.)
While it's true that sorting is O(n lg n), ...
It's not just the sorting but especially the group: that uses span, and both of them build and destroy temporary lists. I.e. they do this:
a linear traversal of an unsorted list will require some other data structure to keep track of all possible duplicates, and lookups in each will add to the space complexity at the very least. While carefully chosen data structures could be used to maintain an overall O(n) running time, the constant would probably make the algorithm slower in practice than the O(n lg n) solution, ...
group/span adds considerably to that complexity, so O(n lg n) is not a correct measure.
while greatly complicating the implementation.
The following all traverse the input list just once. Yes they build auxiliary lists. (Probably a Set would give better performance/quicker lookup.) They maybe look more complex, but to compare apples with apples look also at the code for group/span.
repeated2, repeated3, repeated4 :: Ord a => [a] -> [a]
repeated2/inserter2 builds an auxiliary list of pairs [(a, Bool)], in which the Bool is True if the a appears more than once, False if only once so far.
repeated2 xs = sort $ map fst $ filter snd $ foldr inserter2 [] xs
inserter2 :: Ord a => a -> [(a, Bool)] -> [(a, Bool)]
inserter2 x [] = [(x, False)]
inserter2 x (xb#(x', _): xs)
| x == x' = (x', True): xs
| otherwise = xb: inserter2 x xs
repeated3/inserter3 builds an auxiliary list of pairs [(a, Int)], in which the Int counts how many of the a appear. The aux list is sorted anyway, just for the heck of it.
repeated3 xs = map fst $ filter ((> 1).snd) $ foldr inserter3 [] xs
inserter3 :: Ord a => a -> [(a, Int)] -> [(a, Int)]
inserter3 x [] = [(x, 1)]
inserter3 x xss#(xc#(x', c): xs) = case x `compare` x' of
{ LT -> ((x, 1): xss)
; EQ -> ((x', c+1): xs)
; GT -> (xc: inserter3 x xs)
}
repeated4/go4 builds an output list of elements known to repeat. It maintains an intermediate list of elements met once (so far) as it traverses the input list. If it meets a repeat: it adds that element to the output list; deletes it from the intermediate list; filters that element out of the tail of the input list.
repeated4 xs = sort $ go4 [] [] xs
go4 :: Ord a => [a] -> [a] -> [a] -> [a]
go4 repeats _ [] = repeats
go4 repeats onces (x: xs) = case findUpd x onces of
{ (True, oncesU) -> go4 (x: repeats) oncesU (filter (/= x) xs)
; (False, oncesU) -> go4 repeats oncesU xs
}
findUpd :: Ord a => a -> [a] -> (Bool, [a])
findUpd x [] = (False, [x])
findUpd x (x': os) | x == x' = (True, os) -- i.e. x' removed
| otherwise =
let (b, os') = findUpd x os in (b, x': os')
(That last bit of list-fiddling in findUpd is very similar to span.)
The next lines should show how its has to work..
[14,2,344,41,5,666] after [(14,2),(2,1),(344,3),(5,1),(666,3)]
["Zoo","School","Net"] after [("Zoo",3),("School",6),("Net",3)]
Thats my code up to now
zipWithLength :: [a] -> [(a, Int)]
zipWithLength (x:xs) = zipWith (\acc x -> (x, length x):acc) [] xs
I want to figure out what the problem in the second line is.
If you transform the numbers into strings (using show), you can apply length on them:
Prelude> let zipWithLength = map (\x -> (x, length (show x)))
Prelude> zipWithLength [14,2,344,41,5,666]
[(14,2),(2,1),(344,3),(41,2),(5,1),(666,3)]
However, you cannot use the same function on a list of strings:
Prelude> zipWithLength ["Zoo","School","Net"]
[("Zoo",5),("School",8),("Net",5)]
The numbers are not the lengths of the strings, but of their representations:
Prelude> show "Zoo"
"\"Zoo\""
Prelude> length (show "Zoo")
5
As noted in the comments, similar problems may happen with other types of elements:
Prelude> zipWithLength [(1.0,3),(2.5,3)]
[((1.0,3),7),((2.5,3),7)]
Prelude> show (1.0,3)
"(1.0,3)"
Prelude> length (show (1.0,3))
7
If you want to apply a function on every element of a list, that is a map :: (a -> b) -> [a] -> [b]. The map thus takes a function f and a list xs, and generates a list ys, such that the i-th element of ys, is f applied to the i-th element of xs.
So now the only question is what mapping function we want. We want to take an element x, and return a 2-tuple (x, length x), we can express this with a lambda expression:
mapwithlength = map (\x -> (x, length x))
Or we can use ap :: Monad m => m (a -> b) -> m a -> m b for that:
import Control.Monad(ap)
mapwithlength = map (ap (,) length)
A problem is that this does not work for Ints, since these have no length. We can use show here, but there is an extra problem with that: if we perform show on a String, we get a string literal (this means that we get a string that has quotation marks, and where some characters are escaped). Based on the question, we do not want that.
We can define a parameterized function for that like:
mapwithlength f = map (ap (,) (length . f))
We can basically leave it to the user. In case they want to work with integers, they have to call it with:
forintegers = mapwithlength show
and for Strings:
forstrings = mapwithlength id
After installing the number-length package, you can do:
module Test where
import Data.NumberLength
-- use e.g for list of String
withLength :: [[a]] -> [([a], Int)]
withLength = map (\x -> (x, length x))
-- use e.g for list of Int
withLength' :: NumberLength a => [a] -> [(a, Int)]
withLength' = map (\x -> (x, numberLength x))
Examples:
>>> withLength ["Zoo", "bear"]
[("Zoo",3),("bear",4)]
>>> withLength' [14, 344]
[(14,2),(344,3)]
As bli points out, calculating the length of a number using length (show n) does not transfer to calculating the length of a string, since show "foo" becomes "\"foo\"". Since it is not obvious what the length of something is, you could parameterise the zip function with a length function:
zipWithLength :: (a -> Int) -> [a] -> [(a, Int)]
zipWithLength len = map (\x -> (x, len x))
Examples of use:
> zipWithLength (length . show) [7,13,666]
[(7,1),(13,2),(666,3)]
> zipWithLength length ["Zoo", "School", "Bear"]
[("Zoo",3),("School",6),("Bear",4)]
> zipWithLength (length . concat) [[[1,2],[3],[4,5,6,7]], [[],[],[6],[6,6]]]
[([[1,2],[3,4],[5,6,7]],7),([[],[],[6],[6,6]],3)]
I'm new at haskell and I'm trying to print the elements of a list in a same line . For example:
[1,2,3,4] = 1234
If elements are Strings I can print it with mapM_ putStr ["1","2","3","\n"]
but they aren't.. Someone knows a solution to make a function and print that?
I try dignum xs = [ mapM_ putStr x | x <- xs ] too buts don't work ..
You can use show :: Show a => a -> String to convert an element (here an integer), to its textual representation as a String.
Furthermore we can use concat :: [[a]] -> [a] to convert a list of lists of elements to a list of elements (by concatenating these lists together). In the context of a String, we can thus use concat :: [String] -> String to join the numbers together.
So we can then use:
printConcat :: Show a => [a] -> IO ()
printConcat = putStrLn . concat . map show
This then generates:
Prelude> printConcat [1,2,3,4]
1234
Note that the printConcat function is not limited to numbers (integers), it can take any type of objects that are a type instance of the Show class.
This may be a silly question, but I'm very new to Haskell. (I just started using it a couple of hours ago actually.)
So my problem is that I have a list of 4 elements and I need to print two on one line and two on a new line.
Here's the list:
let list1 = ["#", "#", "#", "#"]
I need the output to look like this:
##
##
I know that i could use the following to print every element on a new line:
mapM_ putStrLn list1
but I'm not sure how to adapt this for only printing part of the list on a new line.
You want something like Data.Text.chunksOf for arbitrary lists, which I've never seen anywhere so I always reimplement it.
import Data.List (unfoldr)
-- This version ensures that the output consists of lists
-- of equal length. To do so, it trims the input.
chunksOf :: Int -> [a] -> [[a]]
chunksOf n = unfoldr (test . splitAt n) where
test (_, []) = Nothing
test x = Just x
Then we can take your [String] and turn it into [[String]], a list of lists each corresponding to String components of a line. We map concat over that list to merge up each line from its components, then use unlines to glue them all together.
grid :: Int -> [String] -> String
grid n = unlines . map concat . chunksOf n
Then we can print that string if desired
main :: IO ()
main = putStrLn $ grid 2 list1
Edit: apparently there is a chunksOf in a fairly popular library Data.List.Split. Their version is to my knowledge identical to mine, though it's implemented a little differently. Both of ours ought to satisfy
chunksOf n xs ++ chunksOf n ys == chunksOf n (xs ++ ys)
whenever length xs `mod` n == 0.
You can do:
mapM_ putStrLn [(take 2 list1), (drop 2 list1)]
where take and drop return lists with the expected number of elements. take 2 takes two elements and drop 2 drops the first two elements.
Looking at tel link Data.List.Split, another solution can be built on using chop.
Define as follow into the lib,
chop :: ([a] -> (b, [a])) -> [a] -> [b]
chop _ [] = []
chop f as = b : chop f as'
where (b, as') = f as
Then following's simeon advice we end with this one liner,
let fun n = mapM_ putStrLn . chop (splitAt n)
chop appears to be a nice function, enough to be mentioned here to illustrate an alternative solution. (unfoldr is great too).
Beginner attempt:
myOut :: [String] -> IO ()
myOut [] = putStr "\n"
myOut (x:xs) =
do if x=="#"
then putStrLn x
else putStr x
myOut xs
ghci>myOut ["#", "#", "#", "#"]
##
##
ghci>