Hybrid MPI+OpenMP Fortran - fortran
I am retrofitting OpenMP to an MPI Fortran computational fluid dynamics code.
I am using the thread funneled approach.
As of now, everytime I run tests the MPI+OpenMP code runs slower (I am using more processor for the MPI+OpenMP version, and it still runs slower, I compared for example 2 CPUs with 2 MPI processes and 6 CPUs with 2 MPI processes and 3 OpenMP threads.
I have been using gprof and I noticed that functions that live in the serial part of the code take thousands of times longer time when I enable OpenMP. Anyone knows why this would happen?
--- edit on 15th September 2016 --
Thank you everyone for the input.
Meanwhile I profiled my code with TAU and understood that gprof was attributing the idle time of threads to a random function. On TAU this same idle time is attributed to the ".TAU aplication". My current problem I seeing a marginal speed up (less than 5%) when I add threads.
#Zulan : Meanwhile I profiled my code with TAU by requesting more threads+processes (2MPI x 7 threads) than I have CPUs available (dual thread quadcore), and by doing this I can see the OpenMP barrier taking a very big time in the execution, so I do not believe this is the case.
#VladimirF I understand your point, sorry for not clearing it out, but I am comparing 2 MPI processes with 1 thread each vs. 2 MPI processes with 2 threads each, so I am actually doubling the resources and seeing no speedup
#tim18 I have not checked using a debugger, but I have checked my output and the results are correct to machine precision
Here is a specific loop that the code executes.
!$OMP PARALLEL DO & ! Has masque, so should be dynamic
!$OMP& private (k,fa3, j, jp, fa2) &
!$OMP& shared (z3r,z4r,z2r,cur,cvr,cwr) &
!$OMP& schedule (runtime)
DO i=1, nL1 ! loop 1
DO k=1,nG3 ! loop 2
fa3=fac3(k)
DO j=1,nG2,2 ! loop 3a
IF(masque(j,k))THEN
jp=j+1
fa2=fac2(j)
!
uu(j,k,i) = fa3*cvr(jp,k,i)-fa2*cwr(jp,k,i)
uu(jp,k,i)=-fa3*cvr(j,k,i) +fa2*cwr(j,k,i)
vv(j,k,i) =-fa3*cur(jp,k,i)-vv(j,k,i)
vv(jp,k,i)= fa3*cur(j,k,i) -vv(jp,k,i)
ww(j,k,i) =ww(j,k,i) +fa2*cur(jp,k,i)
ww(jp,k,i)=ww(jp,k,i) -fa2*cur(j,k,i)
ENDIF
ENDDO ! loop 3a
DO j=1,nG2 ! loop 3b
IF(masque(j,k))THEN
z3r(j,k,i) = cur(j,k,i)
z4r(j,k,i) = cvr(j,k,i)
z2r(j,k,i) = cwr(j,k,i)
ENDIF
ENDDO ! loop 3b
ENDDO ! loop 2
ENDDO ! loop 1
!$OMP END PARALLEL DO
The results with TAU are the following:
%Time Exclusive Inclusive #Call #Subrs Inclusive Name
msec total msec usec/call
2 MPI + 2 threads ---
MEAN
0.9 0.265 8,943 39 39 229330 paralleldo (parallel fork/join) [OpenMP location: file:/home/pedroea/Desktop/runxxx/m_solvvel.f90 <517, 551>]
1.8 0.106 17,887 78 78 229324 paralleldo (parallel begin/end) [OpenMP location: file:/home/pedroea/Desktop/runxxx/m_solvvel.f90 <517, 551>]
1.8 17,805 17,887 78 78 229323 paralleldo (loop body) [OpenMP location: file:/home/pedroea/Desktop/runxxx/m_solvvel.f90 <517, 551>]
0.0 82 82 78 0 1052 paralleldo (barrier enter/exit) [OpenMP location: file:/home/pedroea/Desktop/runxxx/m_solvvel.f90 <517, 551>]
0.0 328 328 312 0 1052 paralleldo (barrier enter/exit) [OpenMP location: file:/home/pedroea/Desktop/runxxx/m_solvvel.f90 <517, 551>]
TOTAL
0.9 1 35,775 156 156 229330 paralleldo (parallel fork/join) [OpenMP location: file:/home/pedroea/Desktop/runxxx/m_solvvel.f90 <517, 551>]
1.8 0.423 1:11.549 312 312 229324 paralleldo (parallel begin/end) [OpenMP location: file:/home/pedroea/Desktop/runxxx/m_solvvel.f90 <517, 551>]
1.8 1:11.220 1:11.548 312 312 229323 paralleldo (loop body) [OpenMP location: file:/home/pedroea/Desktop/runxxx/m_solvvel.f90 <517, 551>]
Node 0 Thread 0
1.8 0.547 17,919 78 78 229737 paralleldo (parallel fork/join) [OpenMP location: file:/home/pedroea/Desktop/runxxx/m_solvvel.f90 <517, 551>]
1.8 0.128 17,918 78 78 229730 paralleldo (parallel begin/end) [OpenMP location: file:/home/pedroea/Desktop/runxxx/m_solvvel.f90 <517, 551>]
1.8 17,863 17,918 78 78 229729 paralleldo (loop body) [OpenMP location: file:/home/pedroea/Desktop/runxxx/m_solvvel.f90 <517, 551>]
0.0 55 55 78 0 714 paralleldo (barrier enter/exit) [OpenMP location: file:/home/pedroea/Desktop/runxxx/m_solvvel.f90 <517, 551>]
Node 0 Thread 1
1.8 0.116 17,919 78 78 229732 paralleldo (parallel begin/end) [OpenMP location: file:/home/pedroea/Desktop/runxxx/m_solvvel.f90 <517, 551>]
1.8 17,788 17,918 78 78 229730 paralleldo (loop body) [OpenMP location: file:/home/pedroea/Desktop/runxxx/m_solvvel.f90 <517, 551>]
0.0 130 130 78 0 1667 paralleldo (barrier enter/exit) [OpenMP location: file:/home/pedroea/Desktop/runxxx/m_solvvel.f90 <517, 551>]
Node 1 Thread 0
1.8 0.511 17,856 78 78 228923 paralleldo (parallel fork/join) [OpenMP location: file:/home/pedroea/Desktop/runxxx/m_solvvel.f90 <517, 551>]
1.8 0.087 17,855 78 78 228917 paralleldo (parallel begin/end) [OpenMP location: file:/home/pedroea/Desktop/runxxx/m_solvvel.f90 <517, 551>]
1.8 17,788 17,918 78 78 229730 paralleldo (loop body) [OpenMP location: file:/home/pedroea/Desktop/runxxx/m_solvvel.f90 <517, 551>]
0.0 40 40 78 0 523 paralleldo (barrier enter/exit) [OpenMP location: file:/home/pedroea/Desktop/runxxx/m_solvvel.f90 <517, 551>]
Node 1 Thread 1
1.8 0.092 17,855 78 78 228917 paralleldo (parallel begin/end) [OpenMP location: file:/home/pedroea/Desktop/runxxx/m_solvvel.f90 <517, 551>]
1.8 17,753 17,855 78 78 228916 paralleldo (loop body) [OpenMP location: file:/home/pedroea/Desktop/runxxx/m_solvvel.f90 <517, 551>]
0.0 101 101 78 0 1302 paralleldo (barrier enter/exit) [OpenMP location: file:/home/pedroea/Desktop/runxxx/m_solvvel.f90 <517, 551>]
2 MPI + 1 thread ---
Node 0 Thread 0
2.0 0.273 20,345 78 78 260834 paralleldo (parallel fork/join) [OpenMP location: file:/home/pedroea/Desktop/2MPI1OpenMP_omplib/m_solvvel.f90 <517, 551>]
2.0 0.101 20,344 78 78 260831 paralleldo (parallel begin/end) [OpenMP location: file:/home/pedroea/Desktop/2MPI1OpenMP_omplib/m_solvvel.f90 <517, 551>]
2.0 20,344 20,344 78 78 260829 paralleldo (loop body) [OpenMP location: file:/home/pedroea/Desktop/2MPI1OpenMP_omplib/m_solvvel.f90 <517, 551>]
0.0 0.184 0.184 78 0 2 paralleldo (barrier enter/exit) [OpenMP location: file:/home/pedroea/Desktop/2MPI1OpenMP_omplib/m_solvvel.f90 <517, 551>]
Node 0 Thread 1
1.9 0.261 20,113 78 78 257860 paralleldo (parallel fork/join) [OpenMP location: file:/home/pedroea/Desktop/2MPI1OpenMP_omplib/m_solvvel.f90 <517, 551>]
1.9 0.072 20,112 78 78 257856 paralleldo (parallel begin/end) [OpenMP location: file:/home/pedroea/Desktop/2MPI1OpenMP_omplib/m_solvvel.f90 <517, 551>]
1.9 20,112 20,112 78 78 257855 paralleldo (loop body) [OpenMP location: file:/home/pedroea/Desktop/2MPI1OpenMP_omplib/m_solvvel.f90 <517, 551>]
0.0 0.176 0.176 78 0 2 paralleldo (barrier enter/exit) [OpenMP location: file:/home/pedroea/Desktop/2MPI1OpenMP_omplib/m_solvvel.f90 <517, 551>]
MEAN
1.9 0.267 20,229 78 78 259347 paralleldo (parallel fork/join) [OpenMP location: file:/home/pedroea/Desktop/2MPI1OpenMP_omplib/m_solvvel.f90 <517, 551>]
1.9 0.0865 20,228 78 78 259344 paralleldo (parallel begin/end) [OpenMP location: file:/home/pedroea/Desktop/2MPI1OpenMP_omplib/m_solvvel.f90 <517, 551>]
1.9 20,228 20,228 78 78 259342 paralleldo (loop body) [OpenMP location: file:/home/pedroea/Desktop/2MPI1OpenMP_omplib/m_solvvel.f90 <517, 551>]
0.0 0.18 0.18 78 0 2 paralleldo (barrier enter/exit) [OpenMP location: file:/home/pedroea/Desktop/2MPI1OpenMP_omplib/m_solvvel.f90 <517, 551>]
TOTAL
1.9 0.534 40,458 156 156 259347 paralleldo (parallel fork/join) [OpenMP location: file:/home/pedroea/Desktop/2MPI1OpenMP_omplib/m_solvvel.f90 <517, 551>]
1.9 0.173 40,457 156 156 259344 paralleldo (parallel begin/end) [OpenMP location: file:/home/pedroea/Desktop/2MPI1OpenMP_omplib/m_solvvel.f90 <517, 551>]
1.9 40,457 40,457 156 156 259342 paralleldo (loop body) [OpenMP location: file:/home/pedroea/Desktop/2MPI1OpenMP_omplib/m_solvvel.f90 <517, 551>]
0.0 0.36 0.36 156 0 2 paralleldo (barrier enter/exit) [OpenMP location: file:/home/pedroea/Desktop/2MPI1OpenMP_omplib/m_solvvel.f90 <517, 551>]
There is a marginal speedup, but it does not come even close to being 50%, and the problem is not the overheads.
Related
Swift: crash at non-existing line
Hey StackOverflow detectives. I've been pulling my hair out for a few months trying to figure out the crashes which happen at a line never exists. In the crash log below, CustomClass.swift does exist but the line 25 doesn't. What would be the true reason for this crash? Hint: we run heavy processing and allocate a lot of memory (about 700MB) in the background that was written in c++, which is in OUR_CUSTOM_FRAMEWORK below. Exception Type: EXC_CRASH (SIGABRT) Exception Codes: 0x0000000000000000, 0x0000000000000000 Exception Note: EXC_CORPSE_NOTIFY Triggered by Thread: 0 Thread 0 name: Thread 0 Crashed: 0 libsystem_kernel.dylib 0x00000001934c6df0 __pthread_kill + 8 1 libsystem_pthread.dylib 0x00000001933e6930 pthread_kill + 228 (pthread.c:1458) 2 libsystem_c.dylib 0x0000000193374ba4 abort + 104 (abort.c:110) 3 libsystem_malloc.dylib 0x00000001933d7fdc malloc_vreport + 564 (malloc_printf.c:183) 4 libsystem_malloc.dylib 0x00000001933d81a4 malloc_report + 64 (malloc_printf.c:192) 5 libsystem_malloc.dylib 0x00000001933cbd1c free + 436 (malloc.c:1733) 6 [OUR_CUSTOM_FRAMEWORK] 0x000000010324ccf8 0x1031a4000 + 691448 7 libsystem_c.dylib 0x0000000193355164 __cxa_finalize_ranges + 416 (atexit.c:284) 8 libsystem_c.dylib 0x00000001933554a0 exit + 28 (exit.c:81) 9 UIKitCore 0x000000019782eb88 -[UIApplication _terminateWithStatus:] + 508 (UIApplication.m:6735) 10 UIKitCore 0x0000000196f97718 -[_UISceneLifecycleMultiplexer _evalTransitionToSettings:fromSettings:forceExit:withTransitionStore:] + 128 (_UISceneLifecycleMultiplexer.m:765) 11 UIKitCore 0x0000000196f9737c -[_UISceneLifecycleMultiplexer forceExitWithTransitionContext:scene:] + 220 (_UISceneLifecycleMultiplexer.m:418) 12 UIKitCore 0x0000000197824ac4 -[UIApplication workspaceShouldExit:withTransitionContext:] + 216 (UIApplication.m:3725) 13 FrontBoardServices 0x000000019893bcf8 -[FBSUIApplicationWorkspaceShim workspaceShouldExit:withTransitionContext:] + 88 (FBSUIApplicationWorkspace.m:144) 14 FrontBoardServices 0x0000000198968d68 __83-[FBSWorkspaceScenesClient willTerminateWithTransitionContext:withAcknowledgement:]_block_invoke_2 + 80 (FBSWorkspaceScenesClient.m:281) 15 FrontBoardServices 0x000000019894debc -[FBSWorkspace _calloutQueue_executeCalloutFromSource:withBlock:] + 240 (FBSWorkspace.m:357) 16 FrontBoardServices 0x0000000198968cf4 __83-[FBSWorkspaceScenesClient willTerminateWithTransitionContext:withAcknowledgement:]_block_invoke + 140 (FBSWorkspaceScenesClient.m:278) 17 libdispatch.dylib 0x000000019338033c _dispatch_client_callout + 20 (object.m:495) 18 libdispatch.dylib 0x00000001933830d4 _dispatch_block_invoke_direct + 264 (queue.c:466) 19 FrontBoardServices 0x000000019898f6c4 __FBSSERIALQUEUE_IS_CALLING_OUT_TO_A_BLOCK__ + 48 (FBSSerialQueue.m:173) 20 FrontBoardServices 0x000000019898f370 -[FBSSerialQueue _queue_performNextIfPossible] + 432 (FBSSerialQueue.m:216) 21 FrontBoardServices 0x000000019898f8dc -[FBSSerialQueue _performNextFromRunLoopSource] + 32 (FBSSerialQueue.m:247) 22 CoreFoundation 0x000000019365baf4 __CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__ + 28 (CFRunLoop.c:1922) 23 CoreFoundation 0x000000019365ba48 __CFRunLoopDoSource0 + 84 (CFRunLoop.c:1956) 24 CoreFoundation 0x000000019365b198 __CFRunLoopDoSources0 + 196 (CFRunLoop.c:1992) 25 CoreFoundation 0x0000000193655f38 __CFRunLoopRun + 796 (CFRunLoop.c:2882) 26 CoreFoundation 0x00000001936558f4 CFRunLoopRunSpecific + 480 (CFRunLoop.c:3192) 27 GraphicsServices 0x000000019da6c604 GSEventRunModal + 164 (GSEvent.c:2246) 28 UIKitCore 0x0000000197829358 UIApplicationMain + 1944 (UIApplication.m:4823) 29 [APPNAME] 0x0000000100d45128 main + 68 (CustomClass.swift:25) 30 libdyld.dylib 0x00000001934d12dc start + 4
df.ix gets NAN as subset
I have a dataframe like below [72 rows x 25 columns]: Pin CPULabel Freq(MHz) DCycle Skew(1-3)min Skew(1-3)mean 0 Dif0 BP100_Fast 99.9843 0.492 0 0 1 Dif0 BP100_Slow 100.011 0.493 0 0 2 Dif0 100HiBW_Fast 100.006 0.503 0 0 3 Dif0 100HiBW_Slow 100.007 0.504 0 0 4 Dif0 100LoBW_Fast 100.005 0.503 0 0 5 Dif0 100LoBW_Slow 99.9951 0.504 0 0 8 Dif1 BP100_Fast 99.9928 0.492 7 10 9 Dif1 BP100_Slow 99.9962 0.492 11 12 10 Dif1 100HiBW_Fast 100.014 0.502 10 11 11 Dif1 100HiBW_Slow 100.006 0.503 6 13 12 Dif1 100LoBW_Fast 99.9965 0.502 5 10 13 Dif1 100LoBW_Slow 99.9946 0.503 12 14 16 Dif2 BP100_Fast 99.9929 0.493 2 6 17 Dif2 BP100_Slow 99.997 0.493 8 13 18 Dif2 100HiBW_Fast 100.002 0.504 4 9 19 Dif2 100HiBW_Slow 99.9964 0.504 13 17 20 Dif2 100LoBW_Fast 100.021 0.504 8 9 I am just interested in the rows which contain BP100_Fast, 100HiBW and 100HiBW strings. So I used the the command below: excel = pd.read_excel('25C_3.3V.xlsx', skiprows=1) excel.fillna(value=0, inplace=True) general = excel[excel['Pin'] != 'Clkin'] general.drop_duplicates(keep=False, inplace=True) slew = general[(general['CPULabel']=='BP100_Fast') | (general['CPULabel']=='100LoBW_Fast') | (general['CPULabel']=='100HiBW_Fast')] I am able to get what I want[36 rows x 25 columns]: Pin CPULabel Freq(MHz) DCycle Skew(1-3)min Skew(1-3)mean 0 Dif0 BP100_Fast 99.9843 0.492 0 0 2 Dif0 100HiBW_Fast 100.006 0.503 0 0 4 Dif0 100LoBW_Fast 100.005 0.503 0 0 8 Dif1 BP100_Fast 99.9928 0.492 7 10 10 Dif1 100HiBW_Fast 100.014 0.502 10 11 12 Dif1 100LoBW_Fast 99.9965 0.502 5 10 16 Dif2 BP100_Fast 99.9929 0.493 2 6 18 Dif2 100HiBW_Fast 100.002 0.504 4 9 20 Dif2 100LoBW_Fast 100.021 0.504 8 9 However, if I changed the last command: slew = general.ix[['BP100_Fast', '100LoBW_Fast', '100HiBW_Fast'], :] I got NAN as my result. [3 rows x 25 columns] Pin CPULabel Freq(MHz) DCycle Skew(1-3)min Skew(1-3)mean BP100_Fast NaN NaN NaN NaN NaN NaN 100LoBW_Fast NaN NaN NaN NaN NaN NaN 100HiBW_Fast NaN NaN NaN NaN NaN NaN Is there any way to complete this with df.ix? Thank you very much.
Per Docs The .ix indexer is deprecated, in favor of the more strict .iloc and .loc indexers. .ix offers a lot of magic on the inference of what the user wants to do. To wit, .ix can decide to index positionally OR via labels, depending on the data type of the index. This has caused quite a bit of user confusion over the years. The full indexing documentation is here. (GH14218) Option 1 isin general[general.CPULabel.isin(['BP100_Fast', '100LoBW_Fast', '100HiBW_Fast'])] Pin CPULabel Freq(MHz) DCycle Skew(1-3)min Skew(1-3)mean 0 Dif0 BP100_Fast 99.9843 0.492 0 0 2 Dif0 100HiBW_Fast 100.0060 0.503 0 0 4 Dif0 100LoBW_Fast 100.0050 0.503 0 0 8 Dif1 BP100_Fast 99.9928 0.492 7 10 10 Dif1 100HiBW_Fast 100.0140 0.502 10 11 12 Dif1 100LoBW_Fast 99.9965 0.502 5 10 16 Dif2 BP100_Fast 99.9929 0.493 2 6 18 Dif2 100HiBW_Fast 100.0020 0.504 4 9 20 Dif2 100LoBW_Fast 100.0210 0.504 8 9 Option 2 query general.query('CPULabel in ["BP100_Fast", "100LoBW_Fast", "100HiBW_Fast"]') Pin CPULabel Freq(MHz) DCycle Skew(1-3)min Skew(1-3)mean 0 Dif0 BP100_Fast 99.9843 0.492 0 0 2 Dif0 100HiBW_Fast 100.0060 0.503 0 0 4 Dif0 100LoBW_Fast 100.0050 0.503 0 0 8 Dif1 BP100_Fast 99.9928 0.492 7 10 10 Dif1 100HiBW_Fast 100.0140 0.502 10 11 12 Dif1 100LoBW_Fast 99.9965 0.502 5 10 16 Dif2 BP100_Fast 99.9929 0.493 2 6 18 Dif2 100HiBW_Fast 100.0020 0.504 4 9 20 Dif2 100LoBW_Fast 100.0210 0.504 8 9 Option 3 pd.Series.str.endswith general[general.CPULabel.str.endswith('Fast')] Pin CPULabel Freq(MHz) DCycle Skew(1-3)min Skew(1-3)mean 0 Dif0 BP100_Fast 99.9843 0.492 0 0 2 Dif0 100HiBW_Fast 100.0060 0.503 0 0 4 Dif0 100LoBW_Fast 100.0050 0.503 0 0 8 Dif1 BP100_Fast 99.9928 0.492 7 10 10 Dif1 100HiBW_Fast 100.0140 0.502 10 11 12 Dif1 100LoBW_Fast 99.9965 0.502 5 10 16 Dif2 BP100_Fast 99.9929 0.493 2 6 18 Dif2 100HiBW_Fast 100.0020 0.504 4 9 20 Dif2 100LoBW_Fast 100.0210 0.504 8 9
Try this approach: labels = ['BP100_Fast', '100HiBW', '100HiBW'] slew = \ pd.read_excel('25C_3.3V.xlsx', skiprows=1) \ .fillna(value=0) \ .query("Pin != Clkin and CPULabel in #labels") \ .drop_duplicates(keep=False) alternatively you can change: slew = general.ix[['BP100_Fast', '100LoBW_Fast', '100HiBW_Fast'], :] to: slew = general.loc[general['CPULabel'].isin(['BP100_Fast','100LoBW_Fast','100HiBW_Fast'])]
Clojure: decide if argument is a prime
I started learning Clojure a few days ago and wrote a simple function that decides whether its given argument is a prime or not. Here is my code: (defn is-prime [n] (nil? (some #(= (mod n %) 0) (range 2 (java.lang.Math/sqrt n))))) My problem is, that this function returns true when it is called with '4'. (is-prime 4) => true I wrote another function for debuggin purposes, it lists all the primes that are less than 250: (defn primes [] (filter #(is-prime %) (range 1 250))) I have looked up the Wikipedia page for the list of prime numbers and found that except for the number '4', the rest of the output is correct. (primes) => (1 2 3 4 5 7 9 11 13 17 19 23 25 29 31 37 41 43 47 49 53 59 61 67 71 73 79 83 89 97 101 103 107 109 113 121 127 131 137 139 149 151 157 163 167 169 173 179 181 191 193 197 199 211 223 227 229 233 239 241) I have been thinking about it, and maybe it is just some beginner's mistake on my part, but I'm unable to find the solution. I would really appreciate your help, thanks in advance.
(range m n) doesn't include n. So (range 2 (sqrt 4) = (range 2 2) = (); it doesn't try any divisors. Note your "primes" list also has 9 in it: (range 2 (sqrt 9)) = (range 2 3) = (2) so it never tries dividing by 3. Same issue for 25, 49, 121, 169; basically for all squares of primes. Simplest fix is (range 2 (inc (sqrt n))).
Clojure Docs: Understanding This Example of the Reduce Function
Reading through the Clojure docs and I'm confused by one example of the reduce function. I understand what reduce does, but there's a lot going on in this example and I'm not sure how it's all working together. (reduce (fn [primes number] (if (some zero? (map (partial mod number) primes)) primes (conj primes number))) [2] (take 1000 (iterate inc 3))) => [2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103 107 109 113 127 131 137 139 149 151 157 163 167 173 179 181 191 193 197 199 211 223 227 229 233 239 241 251 257 263 269 271 277 281 283 293 307 311 313 317 331 337 347 349 353 359 367 373 379 383 389 397 401 409 419 421 431 433 439 443 449 457 461 463 467 479 487 491 499 503 509 521 523 541 547 557 563 569 571 577 587 593 599 601 607 613 617 619 631 641 643 647 653 659 661 673 677 683 691 701 709 719 727 733 739 743 751 757 761 769 773 787 797 809 811 821 823 827 829 839 853 857 859 863 877 881 883 887 907 911 919 929 937 941 947 953 967 971 977 983 991 997] From what I understand, reduce takes a function, in this case an anonymous function. That function takes two arguments, a collection and a number. Then we have a conditional statement that checks if the number zero appears in the collection. (map (partial mod number) primes) confuses me. Doesn't mod take two arguments and return the remainder of dividing the first by the second? It appears that if this conditional returns true, it returns the collection of primes. If not, add the number to the collection of primes. Is that correct? And the final line, it's a collection of 1,000 numbers starting at 3. Would someone be able to walk through this function?
You probably already know this but the first thing to note is the following property: A number is prime if it is indivisible by any prime number smaller than it. So the anonymous function starts with a vector of previous primes, and then checks if number is divisible by any of the previous primes. If it is divisible by any of the previous primes, it will just pass on the vector of previous primes, otherwise it will add the current number to the vector of primes and then return the new vector. (partial mod number) is equivalent to (fn [x] (mod number x)) in this case. To step through a few cases: ;Give a name to the anonymous function (defn prime-checker [primes number] (if (some zero? (map (partial mod number) primes)) primes (conj primes number))) ;This is how reduce will call the anonymous function (prime-checker [2] 3) -> ((map (partial mod number) primes) = [1] -> will return [2 3] (prime-checker [2 3] 4) -> ((map (partial mod number) primes) = [0 1] -> some zero? finds a zero value here so the function will return [2 3] (prime-checker [2 3] 5) -> ((map (partial mod number) primes) = [1 2] -> will return [2 3 5] Hopefully you can see from this how reduce with this function returns a list of primes.
Why is this prime sieve implementation slower?
I was just experimenting a bit with (for me) a new programming language: clojure. And I wrote a quite naive 'sieve' implementation, which I then tried to optimise a bit. Strangely enough though (for me at least), the new implementation wasn't faster, but much slower... Can anybody provide some insight in why this is so much slower? I'm also interested in other tips in how to improve this algorithm... Best regards, Arnaud Gouder ; naive sieve. (defn sieve ([max] (sieve max (range 2 max) 2)) ([max candidates n] (if (> (* n n) max) candidates (recur max (filter #(or (= % n) (not (= (mod % n) 0))) candidates) (inc n))))) ; Instead of just passing the 'candidates' list, from which I sieve-out the non-primes, ; I also pass a 'primes' list, with the already found primes ; I hoped that this would increase the speed, because: ; - Instead of sieving-out multiples of 'all' numbers, I now only sieve-out the multiples of primes. ; - The filter predicate now becomes simpler. ; However, this code seems to be approx 20x as slow. ; Note: the primes in 'primes' end up reversed, but I don't care (much). Adding a 'reverse' call makes it even slower :-( (defn sieve2 ([max] (sieve2 max () (range 2 max))) ([max primes candidates] (let [n (first candidates)] (if (> (* n n) max) (concat primes candidates) (recur max (conj primes n) (filter #(not (= (mod % n) 0)) (rest candidates))))))) ; Another attempt to speed things up. Instead of sieving-out multiples of all numbers in the range, ; I want to sieve-out only multiples of primes.. I don't like the '(first (filter ' construct very much... ; It doesn't seem to be faster than 'sieve'. (defn sieve3 ([max] (sieve max (range 2 max) 2)) ([max candidates n] (if (> (* n n) max) candidates (let [new_candidates (filter #(or (= % n) (not (= (mod % n) 0))) candidates)] (recur max new_candidates (first (filter #(> % n) new_candidates))))))) (time (sieve 10000000)) (time (sieve 10000000)) (time (sieve2 10000000)) (time (sieve2 10000000)) (time (sieve2 10000000)) (time (sieve 10000000)) ; Strange, speeds are very different now... Must be some memory allocation thing caused by running sieve2 (time (sieve 10000000)) (time (sieve3 10000000)) (time (sieve3 10000000)) (time (sieve 10000000))
I have good news and bad news. The good news is that your intuitions are correct. (time (sieve 10000)) ; "Elapsed time: 0.265311 msecs" (2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103 107 109 113 127 131 137 139 149 151 157 163 167 173 179 181 191 193 197 199 211 223 227 229 233 239 241 251 257 263 269 271 277 281 283 293 307 311 313 317 331 337 347 349 353 359 367 373 379 383 389 397 401 409 419 421 431 433 439 443 449 457 461 463 467 479 487 491 499 503 509 521 523 541 547 557 563 ...) (time (sieve2 10000)) ; "Elapsed time: 1.028353 msecs" (2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103 107 109 113 127 131 137 139 149 151 157 163 167 173 179 181 191 193 197 199 211 223 227 229 233 239 241 251 257 263 269 271 277 281 283 293 307 311 313 317 331 337 347 349 353 359 367 373 379 383 389 397 401 409 419 421 431 433 439 443 449 457 461 463 467 479 487 491 499 503 509 521 523 541 547 557 563 ...) The bad news is that both are much slower than you think (time (count (sieve 10000))) ; "Elapsed time: 231.183055 msecs" 1229 (time (count (sieve2 10000))) ; "Elapsed time: 87.822796 msecs" 1229 What's happening is that because filter is lazy, the filtering isn't getting done until the answers need to be printed. All the first expression is counting is the time to wrap the sequence in a load of filters. Putting the count in means that the sequence actually has to be calculated within the timing expression, and then you see how long it really takes. I think in the case without the count, sieve2 is taking longer because it is doing a bit of the work whilst constructing the filtered sequence. When you put the count in, sieve2 is faster because it's the better algorithm. P.S. When I try (time (sieve 10000000)), my machine crashes with a stack overflow, presumably because of the vast stack of nested filter calls it's building up. How come it ran for you?
Some optimization tips for this kind of Primative number heavy math: use clojure 1.3 clonjure 1.3 allows un-boxed-checked-arithmetic so you wont be casting everything to Integer. type hint the function arguments Otherwise you will end up casting all the Ints/Longs to Integer for each function call. (you're not calling any hint-able functions so i'm just listing it here as general advice) don't call any higher order functions. Currently (1.3) lambda functions #( ...) cant be compiled as ^static so they only take Object as arguments. so the calls to filter will require boxing of all the numbers. You're likely loosing enough time in boxing/unboxing Integers/ints that it will make it hard to really judge the different optimizations. If you type hint (and use clojure 1.3) then you will likely get better numbers to judge your optimizations.