Concurrent Prime Generator - concurrency

I'm going through the problems on projecteuler.net to learn how to program in Erlang, and I am having the hardest time creating a prime generator that can create all of the primes below 2 million, in less than a minute. Using the sequential style, I have already written three types of generators, including the Sieve of Eratosthenes, and none of them perform well enough.
I figured a concurrent Sieve would work great, but I'm getting bad_arity messages, and I'm not sure why. Any suggestions on why I have the problem, or how to code it properly?
Here's my code, the commented out sections are where I tried to make things concurrent:
-module(primeserver).
-compile(export_all).
start() ->
register(primes, spawn(fun() -> loop() end)).
is_prime(N) -> rpc({is_prime,N}).
rpc(Request) ->
primes ! {self(), Request},
receive
{primes, Response} ->
Response
end.
loop() ->
receive
{From, {is_prime, N}} ->
if
N From ! {primes, false};
N =:= 2 -> From ! {primes, true};
N rem 2 =:= 0 -> From ! {primes, false};
true ->
Values = is_not_prime(N),
Val = not(lists:member(true, Values)),
From ! {primes, Val}
end,
loop()
end.
for(N,N,_,F) -> [F(N)];
for(I,N,S,F) when I + S [F(I)|for(I+S, N, S, F)];
for(I,N,S,F) when I + S =:= N -> [F(I)|for(I+S, N, S, F)];
for(I,N,S,F) when I + S > N -> [F(I)].
get_list(I, Limit) ->
if
I
[I*A || A
[]
end.
is_not_prime(N) ->
for(3, N, 2,
fun(I) ->
List = get_list(I,trunc(N/I)),
lists:member(N,lists:flatten(List))
end
).
%%L = for(1,N, fun() -> spawn(fun(I) -> wait(I,N) end) end),
%%SeedList = [A || A
%% lists:foreach(fun(X) ->
%% Pid ! {in_list, X}
%% end, SeedList)
%% end, L).
%%wait(I,N) ->
%% List = [I*A || A lists:member(X,List)
%% end.

I wrote an Eratosthenesque concurrent prime sieve using the Go and channels.
Here is the code: http://github.com/aht/gosieve
I blogged about it here: http://blog.onideas.ws/eratosthenes.go
The program can sieve out the first million primes (all primes upto 15,485,863) in about 10 seconds. The sieve is concurrent, but the algorithm is mainly synchronous: there are far too many synchronization points required between goroutines ("actors" -- if you like) and thus they can not roam freely in parallel.

The 'badarity' error means that you're trying to call a 'fun' with the wrong number of arguments. In this case...
%%L = for(1,N, fun() -> spawn(fun(I) -> wait(I,N) end) end),
The for/3 function expects a fun of arity 1, and the spawn/1 function expects a fun of arity 0. Try this instead:
L = for(1, N, fun(I) -> spawn(fun() -> wait(I, N) end) end),
The fun passed to spawn inherits needed parts of its environment (namely I), so there's no need to pass it explicitly.
While calculating primes is always good fun, please keep in mind that this is not the kind of problem Erlang was designed to solve. Erlang was designed for massive actor-style concurrency. It will most likely perform rather badly on all examples of data-parallel computation. In many cases, a sequential solution in, say, ML will be so fast that any number of cores will not suffice for Erlang to catch up, and e.g. F# and the .NET Task Parallel Library would certainly be a much better vehicle for these kinds of operations.

Primes parallel algorithm : http://www.cs.cmu.edu/~scandal/cacm/node8.html

Another alternative to consider is to use probabalistic prime generation. There is an example of this in Joe's book (the "prime server") which uses Miller-Rabin I think...

You can find four different Erlang implementations for finding prime numbers (two of which are based on the Sieve of Eratosthenes) here. This link also contains graphs comparing the performance of the 4 solutions.

The Sieve of Eratosthenes is fairly easy to implement but -- as you have discovered -- not the most efficient. Have you tried the Sieve of Atkin?
Sieve of Atkin # Wikipedia

Two quick single-process erlang prime generators; sprimes generates all primes under 2m in ~2.7 seconds, fprimes ~3 seconds on my computer (Macbook with a 2.4 GHz Core 2 Duo). Both are based on the Sieve of Eratosthenes, but since Erlang works best with lists, rather than arrays, both keep a list of non-eliminated primes, checking for divisibility by the current head and keeping an accumulator of verified primes. Both also implement a prime wheel to do initial reduction of the list.
-module(primes).
-export([sprimes/1, wheel/3, fprimes/1, filter/2]).
sieve([H|T], M) when H=< M -> [H|sieve([X || X<- T, X rem H /= 0], M)];
sieve(L, _) -> L.
sprimes(N) -> [2,3,5,7|sieve(wheel(11, [2,4,2,4,6,2,6,4,2,4,6,6,2,6,4,2,6,4,6,8,4,2,4,2,4,8,6,4,6,2,4,6,2,6,6,4,2,4,6,2,6,4,2,4,2,10,2,10], N), math:sqrt(N))].
wheel([X|Xs], _Js, M) when X > M ->
lists:reverse(Xs);
wheel([X|Xs], [J|Js], M) ->
wheel([X+J,X|Xs], lazy:next(Js), M);
wheel(S, Js, M) ->
wheel([S], lazy:lazy(Js), M).
fprimes(N) ->
fprimes(wheel(11, [2,4,2,4,6,2,6,4,2,4,6,6,2,6,4,2,6,4,6,8,4,2,4,2,4,8,6,4,6,2,4,6,2,6,6,4,2,4,6,2,6,4,2,4,2,10,2,10], N), [7,5,3,2], N).
fprimes([H|T], A, Max) when H*H =< Max ->
fprimes(filter(H, T), [H|A], Max);
fprimes(L, A, _Max) -> lists:append(lists:reverse(A), L).
filter(N, L) ->
filter(N, N*N, L, []).
filter(N, N2, [X|Xs], A) when X < N2 ->
filter(N, N2, Xs, [X|A]);
filter(N, _N2, L, A) ->
filter(N, L, A).
filter(N, [X|Xs], A) when X rem N /= 0 ->
filter(N, Xs, [X|A]);
filter(N, [_X|Xs], A) ->
filter(N, Xs, A);
filter(_N, [], A) ->
lists:reverse(A).
lazy:lazy/1 and lazy:next/1 refer to a simple implementation of pseudo-lazy infinite lists:
lazy(L) ->
repeat(L).
repeat(L) -> L++[fun() -> L end].
next([F]) -> F()++[F];
next(L) -> L.
Prime generation by sieves is not a great place for concurrency (but it could use parallelism in checking for divisibility, although the operation is not sufficiently complex to justify the additional overhead of all parallel filters I have written thus far).
`

Project Euler problems (I'd say most of the first 50 if not more) are mostly about brute force with a splash of ingenuity in choosing your bounds.
Remember to test any if N is prime (by brute force), you only need to see if its divisible by any prime up to floor(sqrt(N)) + 1, not N/2.
Good luck

I love Project Euler.
On the subject of prime generators, I am a big fan of the Sieve of Eratosthenes.
For the purposes of the numbers under 2,000,000 you might try a simple isPrime check implementation. I don't know how you'd do it in erlang, but the logic is simple.
For Each NUMBER in LIST_OF_PRIMES
If TEST_VALUE % NUMBER == 0
Then FALSE
END
TRUE
if isPrime == TRUE add TEST_VALUE to your LIST_OF_PRIMES
iterate starting at 14 or so with a preset list of your beginning primes.
c# ran a list like this for 2,000,000 in well under the 1 minute mark
Edit: On a side note, the sieve of Eratosthenes can be implemented easily and runs quickly, but gets unwieldy when you start getting into huge lists. The simplest implementation, using a boolean array and int values runs extremely quickly. The trouble is that you begin running into limits for the size of your value as well as the length of your array. -- Switching to a string or bitarray implementation helps, but you still have the challenge of iterating through your list at large values.

here is a vb version
'Sieve of Eratosthenes
'http://en.wikipedia.org/wiki/Sieve_of_Eratosthenes
'1. Create a contiguous list of numbers from two to some highest number n.
'2. Strike out from the list all multiples of two (4, 6, 8 etc.).
'3. The list's next number that has not been struck out is a prime number.
'4. Strike out from the list all multiples of the number you identified in the previous step.
'5. Repeat steps 3 and 4 until you reach a number that is greater than the square root of n (the highest number in the list).
'6. All the remaining numbers in the list are prime.
Private Function Sieve_of_Eratosthenes(ByVal MaxNum As Integer) As List(Of Integer)
'tested to MaxNum = 10,000,000 - on 1.8Ghz Laptop it took 1.4 seconds
Dim thePrimes As New List(Of Integer)
Dim toNum As Integer = MaxNum, stpw As New Stopwatch
If toNum > 1 Then 'the first prime is 2
stpw.Start()
thePrimes.Capacity = toNum 'size the list
Dim idx As Integer
Dim stopAT As Integer = CInt(Math.Sqrt(toNum) + 1)
'1. Create a contiguous list of numbers from two to some highest number n.
'2. Strike out from the list all multiples of 2, 3, 5.
For idx = 0 To toNum
If idx > 5 Then
If idx Mod 2 <> 0 _
AndAlso idx Mod 3 <> 0 _
AndAlso idx Mod 5 <> 0 Then thePrimes.Add(idx) Else thePrimes.Add(-1)
Else
thePrimes.Add(idx)
End If
Next
'mark 0,1 and 4 as non-prime
thePrimes(0) = -1
thePrimes(1) = -1
thePrimes(4) = -1
Dim aPrime, startAT As Integer
idx = 7 'starting at 7 check for primes and multiples
Do
'3. The list's next number that has not been struck out is a prime number.
'4. Strike out from the list all multiples of the number you identified in the previous step.
'5. Repeat steps 3 and 4 until you reach a number that is greater than the square root of n (the highest number in the list).
If thePrimes(idx) <> -1 Then ' if equal to -1 the number is not a prime
'not equal to -1 the number is a prime
aPrime = thePrimes(idx)
'get rid of multiples
startAT = aPrime * aPrime
For mltpl As Integer = startAT To thePrimes.Count - 1 Step aPrime
If thePrimes(mltpl) <> -1 Then thePrimes(mltpl) = -1
Next
End If
idx += 2 'increment index
Loop While idx < stopAT
'6. All the remaining numbers in the list are prime.
thePrimes = thePrimes.FindAll(Function(i As Integer) i <> -1)
stpw.Stop()
Debug.WriteLine(stpw.ElapsedMilliseconds)
End If
Return thePrimes
End Function

Related

Generating an infinite list of the hamming sequence [duplicate]

(this is exciting!) I know, the subject matter is well known. The state of the art (in Haskell as well as other languages) for efficient generation of unbounded increasing sequence of Hamming numbers, without duplicates and without omissions, has long been the following (AFAIK - and by the way it is equivalent to the original Edsger Dijkstra's solution, too):
hamm :: [Integer]
hamm = 1 : map (2*) hamm `union` map (3*) hamm `union` map (5*) hamm
where
union a#(x:xs) b#(y:ys) = case compare x y of
LT -> x : union xs b
EQ -> x : union xs ys
GT -> y : union a ys
The question I'm asking is, can you find the way to make it more efficient in any significant measure? Is it still the state of the art or is it in fact possible to improve this to run twice faster?
If your answer is yes, please show the code and discuss its speed and empirical orders of growth in comparison to the above (it runs at about ~ n1.05…1.10 for the first few hundreds of thousands of numbers produced). Also, if it exists, can this efficient algorithm be extended to producing a sequence of smooth numbers with any given set of primes?
(clarification: I'm not asking about the much faster direct generation of an nth Hamming number, but rather generating all first n numbers in the sequence.)
If a constant factor(1) speedup counts as significant, then I can offer a significantly more efficient version:
hamm :: [Integer]
hamm = mrg1 hamm3 (map (2*) hamm)
where
hamm5 = iterate (5*) 1
hamm3 = mrg1 hamm5 (map (3*) hamm3)
merge a#(x:xs) b#(y:ys)
| x < y = x : merge xs b
| otherwise = y : merge a ys
mrg1 (x:xs) ys = x : merge xs ys
You can easily generalise it to smooth numbers for a given set of primes:
hamm :: [Integer] -> [Integer]
hamm [] = [1]
hamm [p] = iterate (p*) 1
hamm ps = foldl' next (iterate (q*) 1) qs
where
(q:qs) = sortBy (flip compare) ps
next prev m = let res = mrg1 prev (map (m*) res) in res
merge a#(x:xs) b#(y:ys)
| x < y = x : merge xs b
| otherwise = y : merge a ys
mrg1 (x:xs) ys = x : merge xs ys
It's more efficient because that algorithm doesn't produce any duplicates and it uses less memory. In your version, when a Hamming number near h is produced, the part of the list between h/5 and h has to be in memory. In my version, only the part between h/2 and h of the full list, and the part between h/3 and h of the 3-5-list needs to be in memory. Since the 3-5-list is much sparser, and the density of k-smooth numbers decreases, those two list parts need much less memory that the larger part of the full list.
Some timings for the two algorithms to produce the kth Hamming number, with empirical complexity of each target relative to the previous, excluding and including GC time:
k Yours (MUT/GC) Mine (MUT/GC)
10^5 0.03/0.01 0.01/0.01 -- too short to say much, really
2*10^5 0.07/0.02 0.02/0.01
5*10^5 0.17/0.06 0.968 1.024 0.06/0.04 1.199 1.314
10^6 0.36/0.13 1.082 1.091 0.11/0.10 0.874 1.070
2*10^6 0.77/0.27 1.097 1.086 0.21/0.21 0.933 1.000
5*10^6 1.96/0.71 1.020 1.029 0.55/0.59 1.051 1.090
10^7 4.05/1.45 1.047 1.043 1.14/1.25 1.052 1.068
2*10^7 8.73/2.99 1.108 1.091 2.31/2.65 1.019 1.053
5*10^7 21.53/7.83 0.985 1.002 6.01/7.05 1.044 1.057
10^8 45.83/16.79 1.090 1.093 12.42/15.26 1.047 1.084
As you can see, the factor between the MUT times is about 3.5, but the GC time is not much different.
(1) Well, it looks constant, and I think both variants have the same computational complexity, but I haven't pulled out pencil and paper to prove it, nor do I intend to.
So basically, now that Daniel Fischer gave his answer, I can say that I came across this recently, and I think this is an exciting development, since the classical code was known for ages, since Dijkstra.
Daniel correctly identified the redundancy of the duplicates generation which must then be removed, in the classical version.
The credit for the original discovery (AFAIK) goes to Rosettacode.org's contributor Ledrug, as of 2012-08-26. And of course the independent discovery by Daniel Fischer, here (2012-09-18).
Re-written slightly, that code is:
import Data.Function (fix)
hamm = 1 : foldr (\n s -> fix (merge s . (n:) . map (n*))) [] [2,3,5]
with the usual implementation of merge,
merge a#(x:xs) b#(y:ys) | x < y = x : merge xs b
| otherwise = y : merge a ys
merge [] b = b
merge a [] = a
It gives about 2.0x - 2.5x a speedup vs. the classical version.
Well this was easier than I thought. This will do 1000 Hammings in 0.05 seconds on my slow PC at home. This afternoon at work and a faster PC times of less than 600 were coming out as zero seconds.
This take Hammings from Hammings. It's based on doing it fastest in Excel.
I was getting wrong numbers after 250000, with Int. The numbers grow very big very fast, so Integer must be used to be sure, because Int is bounded.
mkHamm :: [Integer] -> [Integer] -> [Integer] -> [Integer]
-> Int -> (Integer, [Int])
mkHamm ml (x:xs) (y:ys) (z:zs) n =
if n <= 1
then (last ml, map length [(x:xs), (y:ys), (z:zs)])
else mkHamm (ml++[m]) as bs cs (n-1)
where
m = minimum [x,y,z]
as = if x == m then xs ++ [m*2] else (x:xs) ++ [m*2]
bs = if y == m then ys ++ [m*3] else (y:ys) ++ [m*3]
cs = if z == m then zs ++ [m*5] else (z:zs) ++ [m*5]
Testing,
> mkHamm [1] [2] [3] [5] 5000
(50837316566580,[306,479,692]) -- (0.41 secs)
> mkHamm [1] [2] [3] [5] 10000
(288325195312500000,[488,767,1109]) -- (1.79 secs)
> logBase 2 (1.79/0.41) -- log of times ratio =
2.1262637726461726 -- empirical order of growth
> map (logBase 2) [488/306, 767/479, 1109/692] :: [Float]
[0.6733495, 0.6792009, 0.68041545] -- leftovers sizes ratios
This means that this code's run time's empirical order of growth is above quadratic (~n^2.13 as measured, interpreted, at GHCi prompt).
Also, the sizes of the three dangling overproduced segments of the sequence are each ~n^0.67 i.e. ~n^(2/3).
Additionally, this code is non-lazy: the resulting sequence's first element can only be accessed only after the very last one is calculated.
The state of the art code in the question is linear, overproduces exactly 0 elements past the point of interest, and is properly lazy: it starts producing its numbers immediately.
So, though an immense improvement over the previous answers by this poster, it is still significantly worse than the original, let alone its improvement as appearing in the top two answers.
12.31.2018
Only the very best people educate. #Will Ness also has authored or co-authored 19 chapters in GoalKicker.com “Haskell for Professionals”. The free book is a treasure.
I had carried around the idea of a function that would do this, like this. I was apprehensive because I thought it would be convoluted and involved logic like in some modern languages. I decided to start writing and was amazed how easy Haskell makes the realization of even bad ideas.
I've not had difficulty generating unique lists. My problem is the lists I generate do not end well. Even when I use diagonalization they leave residual values making their use unreliable at best.
Here is a reworked 3's and 5's list with nothing residual at the end. The denationalization is to reduce residual values not to eliminate duplicates which are never included anyway.
g3s5s n=[t*b|(a,b)<-[ (((d+n)-(d*2)), 5^d) | d <- [0..n]],
t <-[ 3^e | e <- [0..a+8]],
(t*b)<-(3^(n+6))+a]
ham2 n = take n $ ham2' (drop 1.sort.g3s5s $ 48) [1]
ham2' o#(f:fo) e#(h:hx) = if h == min h f
then h:ham2' o (hx ++ [h*2])
else f:ham2' fo ( e ++ [f*2])
The twos list can be generated with all 2^es multiplied by each of the 3s5s but when identity 2^0 is included, then, in total, it is the Hammings.
3/25/2019
Well, finally. I knew this some time ago but could not implement it without excess values at the end. The problem was how to not generate the excess that is the result of a Cartesian Product. I use Excel a lot and could not see the pattern of values to exclude from the Cartesian Product worksheet. Then, eureka! The functions generate lists of each lead factor. The value to limit the values in each list is the end point of the first list. When this is done, all Hammings are produced with no excess.
Two functions for Hammings. The first is a new 3's & 5's list which is then used to create multiples with the 2's. The multiples are Hammings.
h35r x = h3s5s x (5^x)
h3s5s x c = [t| n<-[3^e|e<-[0..x]],
m<-[5^e|e<-[0..x]],
t<-[n*m],
t <= c ]
a2r n = sort $ a2s n (2^n)
a2s n c = [h| b<-h35r n,
a<-[2^e| e<-[0..n]],
h<-[a*b],
h <= c ]
last $ a2r 50
1125899906842624
(0.16 secs, 321,326,648 bytes)
2^50
1125899906842624
(0.00 secs, 95,424 bytes
This is an alternate, cleaner & faster with less memory usage implementation.
gnf n f = scanl (*) 1 $ replicate f n
mk35 n = (\c-> [m| t<- gnf 3 n, f<- gnf 5 n, m<- [t*f], m<= c]) (2^(n+1))
mkHams n = (\c-> sort [m| t<- mk35 n, f<- gnf 2 (n+1), m<- [t*f], m<= c]) (2^(n+1))
last $ mkHams 50
2251799813685248
(0.03 secs, 12,869,000 bytes)
2^51
2251799813685248
5/6/2019
Well, I tried limiting differently but always come back to what is simplest. I am opting for the least memory usage as also seeming to be the fastest.
I also opted to use map with an implicit parameter.
I also found that mergeAll from Data.List.Ordered is faster that sort or sort and concat.
I also like when sublists are created so I can analyze the data much easier.
Then, because of #Will Ness switched to iterate instead of scanl making much cleaner code. Also because of #Will Ness I stopped using the last of of 2s list and switched to one value determining all lengths.
I do think recursively defined lists are more efficient, the previous number multiplied by a factor.
Just separating the function into two doesn't make a difference so the 3 and 5 multiples would be
m35 lim = mergeAll $
map (takeWhile (<=lim).iterate (*3)) $
takeWhile (<=lim).iterate (*5) $ 1
And the 2s each multiplied by the product of 3s and 5s
ham n = mergeAll $
map (takeWhile (<=lim).iterate (*2)) $ m35 lim
where lim= 2^n
After editing the function I ran it
last $ ham 50
1125899906842624
(0.00 secs, 7,029,728 bytes)
then
last $ ham 100
1267650600228229401496703205376
(0.03 secs, 64,395,928 bytes)
It is probably better to use 10^n but for comparison I again used 2^n
5/11/2019
Because I so prefer infinite and recursive lists I became a bit obsessed with making these infinite.
I was so impressed and inspired with #Daniel Wagner and his Data.Universe.Helpers I started using +*+ and +++ but then added my own infinite list. I had to mergeAll my list to work but then realized the infinite 3 and 5 multiples were exactly what they should be. So, I added the 2s and mergeAlld everything and they came out. Before, I stupidly thought mergeAll would not handle infinite list but it does most marvelously.
When a list is infinite in Haskell, Haskell calculates just what is needed, that is, is lazy. The adjunct is that it does calculate from, the start.
Now, since Haskell multiples until the limit of what is wanted, no limit is needed in the function, that is, no more takeWhile. The speed up is incredible and the memory lowered too,
The following is on my slow home PC with 3GB of RAM.
tia = mergeAll.map (iterate (*2)) $
mergeAll.map (iterate (*3)) $ iterate (*5) 1
last $ take 10000 tia
288325195312500000
(0.02 secs, 5,861,656 bytes)
6.5.2019
I learned how to ghc -02 So the following is for 50000 Hammings to 2.38E+30. And this is further proof my code is garbage.
INIT time 0.000s ( 0.000s elapsed)
MUT time 0.000s ( 0.916s elapsed)
GC time 0.047s ( 0.041s elapsed)
EXIT time 0.000s ( 0.005s elapsed)
Total time 0.047s ( 0.962s elapsed)
Alloc rate 0 bytes per MUT second
Productivity 0.0% of total user, 95.8% of total elapsed
6.13.2019
#Will Ness rawks. He provided a clean and elegant revision of tia above and it proved to be five times as fast in GHCi. When I ghc -O2 +RTS -s his against mine, mine was several times as fast. There had to be a compromise.
So, I started reading about fusion that I had encountered in R. Bird's Thinking Functionally with Haskell and almost immediately tried this.
mai n = mergeAll.map (iterate (*n))
mai 2 $ mai 3 $ iterate (*5) 1
It matched Will's at 0.08 for 100K Hammings in GHCi but what really surprised me is (also for 100K Hammings.) this and especially the elapsed times. 100K is up to 2.9e+38.
TASKS: 3 (1 bound, 2 peak workers (2 total), using -N1)
SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)
INIT time 0.000s ( 0.000s elapsed)
MUT time 0.000s ( 0.002s elapsed)
GC time 0.000s ( 0.000s elapsed)
EXIT time 0.000s ( 0.000s elapsed)
Total time 0.000s ( 0.002s elapsed)
Alloc rate 0 bytes per MUT second
Productivity 100.0% of total user, 90.2% of total elapsed

Check if numbers from multiple arrays sum up to given numbers

I was trying to do this question:
Given three integers: GA, GB, and GC(which represent apples, oranges,
and bananas respectively) & N lines each consisting of three integers:
A, B, and C, which represent the amount of apples, oranges, and
bananas in that food, respectively.
Check if it's possible to use only certain cartons such that the total
apples, oranges, and bananas sum up to GA, Gb and GC respectively. For
each available carton, we can only choose to buy it or not to buy it.
He can't buy a certain carton more than once, and he can't buy a
fractional amount of a carton.
Sample Test Case
IN
100 100 100
3
10 10 40
10 30 10
10 60 50
OUT
no
IN
100 100 100
5
40 70 30
30 10 40
20 20 50
10 50 90
40 10 20
OUT
yes
For this problem, I have written some code but have been getting segmentation faults only and a number of errors. Plus, my algorithm is quite bad. What I do is find all subsets of the apples array such that their sum is GA, then I check to see if any of those sets have oranges and bananas to add to GB and GC. But this idea is quite slow and very difficult to code...
I believe this is somewhat a variation of the knapsack problem and can be solved in a better complexity (atleast better than O(2^N)[My current complexity] ;P ). SO, what would be a better algorithm to solve this question, and also, see my current code at PasteBin(I havent put the code on stackoverflow because it is flawed, and moreover, I believe I'll have to start from scratch with it...)
The segmentation faults are entirely your problem.
Knapsack is NP-complete, and so is this (assume input where A, B, C are always the same, and Ga = half the sum of the A's). I don't think anyone is asking you to solve NP-complete problems here.
Obviously you don't check all sets, but only those with sum A <= 100, sum B <= 100, sum C <=100.
Same situation as with this question.
This question is 2nd problem from Facebook Hackercup qualification round which is currently in progress (it will end 12th of January 12AM UTC).
It's not really fair to ask here solutions for the problems of active programming competitions.
This is a variant of the 0-1 knapsack problem. This problem is NP-hard, so there is not much hope to find a solution in polynomial time, but there exists a solution in pseudo-polynomial time which makes this problem rather easy (in the world of complex problems).
The algorithm works as follows:
Start with a collection (for instance a set) containing the tuple <0,0,0>.
For each carton <a',b',c'>: iterate over the all tuples <a,b,c> in the collection and add <a+a',b+b',c+c'> to the collection, ensure that duplicates are not added. Don't add tuples where one or more elements have exceeded the corresponding target value.
If the given collection contains the target values after the algorithm, print "yes", otherwise "no".
Optionally but strongly advisable lower-bound elimination: you can also perform lookaheads and for instance eliminate all values that will never reach the given target anymore (say you can at most add 20 apples, then all values less than 80 apples can be eleminated).
Concept 1 (Lowerbound): Since you add values of tuples together, you now that if there are tuples <a0,a1,a2> and <b0,b1,b2> left, adding these will at most increase a tuple with <a0+b0,a1+b1,a2+b2>. Now say the target is <t0,t1,t2> then you can safely eliminate a tuple <q0,q1,q2> if q0+a0+b0 < t0 (generalize to other tuple elements), since even if you can add the last tuples, it will never reach the required values. The lower bound is thus <t0-a0-b0,t1-a1-b1,t2-a2-b2>. You can generalize this for n tuples.
So first you add up all provided tuples together (for the second instance, that's <140,160,230>) and than subtract that from the target (the result is thus: <-40,-60,-130>). Each iteration, the lower bound is increased with that carton, so after the first iteration, the result for the second example is (<-40+40,-60+70,-130+30> or <0,10,-100>).
The time complexity is however O(ta^3 tb^3 tc^3) with ta, tb and tc the target values.
Example 1 (high level on the two given testcases):
INPUT
100 100 100
3
10 10 40
10 30 10
10 60 50
The set starts with {<0,0,0>}, after each iteration we get:
{<0,0,0>};
{<0,0,0>,<10,10,40>};
{<0,0,0>,<10,10,40>,<10,30,10>,<20,40,50>}; and
{<0,0,0>,<10,10,40>,<10,30,10>,<20,40,50>,<10,60,50>,<10,60,50>,<20,70,90>,<30,100,100>}, thus fail.
With underbound-elimination:
{<0,0,0>}, lowerbound <100-30,100-100,100-100>=<70,0,0> thus eliminate <0,0,0>.
{} thus print "no".
Example 2
INPUT
100 100 100
5
40 70 30
30 10 40
20 20 50
10 50 90
40 10 20
With lower-bound elimination:
{<0,0,0>} lower bound: <-40,-60,-130> thus ok.
{<0,0,0>,<40,70,30>} lower bound: <0,10,-100> (eliminate <0,0,0> because second conflicts).
{<40,70,30>,<70,80,70>} lower bound: <30,20,-60> (no elimination).
{<40,70,30>,<70,80,70>,<60,90,80>,<90,100,120>} lower bound: <50,40,-10> (eliminate <40,70,30>) upper eliminate <90,100,120>.
{<70,80,70>,<60,90,80>,<80,130,160>,<70,140,170>} lower bound: <60,90,80> (eliminate <70,80,70>) upper eliminate <80,130,160> and <70,140,170>.
{<60,90,80>,<100,100,100>} lower bound: <100,100,100> (eliminate <60,90,80>).
{<100,100,100>} thus "yes".
Haskell program
I've implemented a (not that efficient, but proof of concept) Haskell program that does the trick for an arbitrary tuple-length:
import qualified Data.Set as Set
tupleSize :: Int
tupleSize = 3
group :: Int -> [a] -> [[a]]
group _ [] = []
group n l = take n l : group n (drop n l)
empty :: Int -> Set.Set [Int]
empty n = Set.fromList [replicate n 0]
solve :: [Int] -> [[Int]] -> Bool
solve t qs = Set.member t $ mix t (lowerBound t qs) qs $ empty $ length t
lowerBound :: [Int] -> [[Int]] -> [Int]
lowerBound = foldl (zipWith (-))
lowerCheck :: [Int] -> [Int] -> Bool
lowerCheck l x = and $ zipWith (<=) l x
targetCheck :: [Int] -> [Int] -> Bool
targetCheck t x = and $ zipWith (>=) t x
takeout :: Int -> [a] -> [a]
takeout _ [] = []
takeout i (h:hs) | i == 0 = hs
| otherwise = h : takeout (i-1) hs
mix :: [Int] -> [Int] -> [[Int]] -> Set.Set [Int] -> Set.Set [Int]
mix _ _ [] s = s
mix t l (q:qs) s = mix t (zipWith(+) l q) qs $ Set.filter (lowerCheck l) $ Set.union s $ Set.filter (targetCheck t) $ Set.map (zipWith (+) q) s
reply :: Bool -> String
reply True = "yes"
reply False = "no"
main = interact $ \x -> let tuples = group tupleSize $ takeout tupleSize $ map read (words x) in reply $ solve (head tuples) (tail tuples)
You can compile an run it using:
ghc file.hs
./file < input
Conclusion: Although the worst-case behavior can be hard, the second example shows that the problem can be solve efficiently for some cases.

Building a list from left-to-right in Haskell without ++

Is there a way to build lists from left-to-right in Haskell without using ++?
cons is a constant time operation and I want to keep the code efficient. I feel like there's a general way to take advantage of Haskell's laziness to do something like this, but I can't think of it.
Right now I'm writing a function that creates a Collatz Sequence, but it's building the list in the wrong direction:
module CollatzSequence where
collatz :: (Integral a) => a -> [a] -> [a];
collatz n l
| n <= 0 = error "Enter a starting number > 0"
collatz n [] = collatz n [n]
collatz n l#(x:_)
| x == 1 = l
| even x = collatz n ((div x 2):l)
| otherwise = collatz n ((x*3 + 1):l)
In GHCi:
*CollatzSequence> collatz 13 []
[1,2,4,8,16,5,10,20,40,13]
There is indeed a way to take advantage of laziness. In Haskell you can safely do recursive calls inside lazy data constructors, and there will be no risk of stack overflow or divergence. Placing the recursive call inside a constructor eliminates the need for an accumulator, and the order of elements in the list will also correspond to the order in which they are computed:
collatz :: Integer -> [Integer]
collatz n | n <= 1 = []
collatz n = n : collatz next where
next = if even n then div n 2 else n * 3 + 1
For example, the expression head $ collatz 10 evaluates to head (10 : <thunk>) which evaluates to 10, and the thunk in the tail will stay unevaluated. Another advantage is that list nodes can be garbage collected while iterating over the list. foldl' (+) 0 (collatz n) runs in constant space, since the visited nodes are no longer referenced by the rest of the program and can be freed. This is not the case in your original function, since - being tail recursive - it cannot provide any partial result until the whole list is computed.
Is this what you are looking for?
collatz :: (Integral a) => a -> [a]
collatz n
| n <= 0 = error "Enter a starting number > 0"
| n == 1 = [1]
| even n = n : collatz (div n 2)
| otherwise = n : collatz (n*3 + 1)

Haskell - Finding Divisors of an Integer

According to the book this is how its done, but I am not able to get this to work. It gives me an error Not in scope: 'ld'. I'm guessing I should be importing some package but not sure which one. Also the book uses GS module at the prompt but I'm using WinGHCi that has Prelude. What am I missing here?
factors :: Int -> [Int]
factors n | n < 1 = error "not positive"
| n == 1 = []
| otherwise = p : factors (div n p)
where p = ld n
I guess this can also be done using map and filter functions? How?
I suppose the aim of the assignment is to teach you about list comprehensions, filter and similar constructs, and not to have you write functions that test for primality or create the list of divisors in any sensible way. Therefore what you need is a predicate divides,
divides :: Int -> Int -> Bool
a `divides` b = ???
Then you use that predicate for the argument to filter or in a list comprehension to find the list of divisors, and use the divisors function for your isPrime test.
You want to inspect all numbers from 1 to n, and keep them only if they divide n. The filter function can help you:
divisors n = filter ??? [1..n]
So what condition you need to put in place of ??? ?
For the isPrime function you could reuse the divisors function, you already mentioned how.
Break it down into simpler steps.
Write a function, divides :: Int -> Int -> Bool such that
x `divides` n
is true when x is a divisor of n. So, first, think about what it means for x to be a divisor of n.
Now that you have a way to check if a single number x is a divisor of n, you need to check a certain range of numbers less than n to see which ones are divisors.
Hint: In Haskell, you can generate a list of numbers from 1 to n like so: [1..n]
This is where that filter function you mention would be useful. Check its type:
filter :: (a -> Bool) -> [a] -> [a]
Just replace the a above with Int.
As far as the isPrime function, just think about what it means for a number to be prime... if you've calculated your divisors correctly, you can check the list to make sure that it matches with that property.
If this is a homework related question, you should definitely tag it with homework, then people don't feel as timid about helping out :)

haskell, counting how many prime numbers are there in a list

i m a newbie to haskell, currently i need a function 'f' which, given two integers, returns the number of prime numbers in between them (i.e., greater than the first integer but smaller than the second).
Main> f 2 4
1
Main> f 2 10
3
here is my code so far, but it dosent work. any suggestions? thanks..
f :: Int -> Int -> Int
f x y
| x < y = length [ n | n <- [x..y], y 'mod' n == 0]
| otherwise = 0
Judging from your example, you want the number of primes in the open interval (x,y), which in Haskell is denoted [x+1 .. y-1].
Your primality testing is flawed; you're testing for factors of y.
To use a function name as an infix operator, use backticks (`), not single quotes (').
Try this instead:
-- note: no need for the otherwise, since [x..y] == [] if x>y
nPrimes a b = length $ filter isPrime [a+1 .. b-1]
Exercise for the reader: implement isPrime. Note that it only takes one argument.
Look at what your list comprehension does.
n <- [x..y]
Draw n from a list ranging from x to y.
y `mod` n == 0
Only select those n which evenly divide y.
length (...)
Find how many such n there are.
What your code currently does is find out how many of the numbers between x and y (inclusive) are factors of y. So if you do f 2 4, the list will be [2, 4] (the numbers that evenly divide 4), and the length of that is 2. If you do f 2 10, the list will be `[2, 5, 10] (the numbers that evenly divide 10), and the length of that is 3.
It is important to try to understand for yourself why your code doesn't work. In this case, it's simply the wrong algorithm. For algorithms that find whether a number is prime, among many other sources, you can check the wikipedia article: Primality test.
I you want to work with large intervals, then it might be a better idea to compute a list of primes once (instead of doing a isPrime test for every number):
primes = -- A list with all prime numbers
candidates = [a+1 .. b-1]
myprimes = intersectSortedLists candidates primes
nPrimes = length $ myprimes