Clojure中的快速素数生成

我一直在努力解决Clojure中的项目欧拉问题,使其变得更好,而且我已经遇到了几次素数生成。 我的问题是,这只是太长时间了。 我希望有人能够帮助我find一个有效的方式来以Clojure-y的方式做到这一点。

当我拳头这样做的时候,我蛮横地逼着它。 这很容易做到。 但是计算10001个素数在Xeon 2.33GHz上这样花了2分钟,规则太长了,总的来说太长了。 这是algorithm:

(defn next-prime-slow "Find the next prime number, checking against our already existing list" ([sofar guess] (if (not-any? #(zero? (mod guess %)) sofar) guess ; Then we have a prime (recur sofar (+ guess 2))))) ; Try again (defn find-primes-slow "Finds prime numbers, slowly" ([] (find-primes-slow 10001 [2 3])) ; How many we need, initial prime seeds ([needed sofar] (if (<= needed (count sofar)) sofar ; Found enough, we're done (recur needed (concat sofar [(next-prime-slow sofar (last sofar))]))))) 

通过用一个更新的例程代替next-prime-slow(例如6n +/- 1属性),我能够将速度加快到大约70秒。

接下来,我尝试在纯粹的Clojure中制作Eratosthenes的筛选器。 我不认为我把所有的错误都弄清楚了,但是我放弃了,因为它太慢了(甚至比上面更糟糕,我认为)。

 (defn clean-sieve "Clean the sieve of what we know isn't prime based" [seeds-left sieve] (if (zero? (count seeds-left)) sieve ; Nothing left to filter the list against (recur (rest seeds-left) ; The numbers we haven't checked against (filter #(> (mod % (first seeds-left)) 0) sieve)))) ; Filter out multiples (defn self-clean-sieve ; This seems to be REALLY slow "Remove the stuff in the sieve that isn't prime based on it's self" ([sieve] (self-clean-sieve (rest sieve) (take 1 sieve))) ([sieve clean] (if (zero? (count sieve)) clean (let [cleaned (filter #(> (mod % (last clean)) 0) sieve)] (recur (rest cleaned) (into clean [(first cleaned)])))))) (defn find-primes "Finds prime numbers, hopefully faster" ([] (find-primes 10001 [2])) ([needed seeds] (if (>= (count seeds) needed) seeds ; We have enough (recur ; Recalculate needed (into seeds ; Stuff we've already found (let [start (last seeds) end-range (+ start 150000)] ; NOTE HERE (reverse (self-clean-sieve (clean-sieve seeds (range (inc start) end-range)))))))))) 

这不好。 如果数字150000较小,也会导致堆栈溢出。 尽pipe我正在使用重复发生。 这可能是我的错。

接下来,我尝试了一个筛选,在Java ArrayList上使用Java方法。 这花了不less时间和记忆。

我最近的尝试是使用Clojure散列映射的筛子,在筛子中插入所有数字,然后解开不是素数的数字。 最后,它把关键列表,这是它发现的素数。 大约需要10-12秒才能find10000个素数。 我不确定它是否已经完全debugging。 它也是recursion的(使用循环和循环),因为我试图成为Lispy。

所以有了这样的问题,问题10(总计200万以下的素数)正在杀死我。 我最快的代码提出了正确的答案,但它花了105秒来做,并需要相当多的内存(我给了它512 MB,所以我不会大惊小怪)。 我的其他algorithm花了这么长时间,我总是先停下来。

我可以用一个筛子来计算Java或C中的许多素数,而且不用太多的内存。 我知道我必须在Clojure / Lisp风格中丢失一些导致问题的东西。

有什么我做的真的错了吗? Clojure有点慢,大序列? 读一些欧拉的讨论,人们已经计算出其他Lisp中的10000个素数在100毫秒以内。 我意识到JVM可能会减慢速度,而Clojure相对比较年轻,但我不希望有100倍的差异。

有人能让我快速计算Clojure中的素数吗?

这是庆祝Clojure's Java interop的另一种方法。 这在2.4 Ghz Core 2 Duo(运行单线程)上需要374ms。 我让Java的BigInteger#isProbablePrime有效的Miller-Rabin实现处理素数检查。

 (def certainty 5) (defn prime? [n] (.isProbablePrime (BigInteger/valueOf n) certainty)) (concat [2] (take 10001 (filter prime? (take-nth 2 (range 1 Integer/MAX_VALUE))))) 

5的Miller-Rabin确定性对于比这个大得多的数字可能不是很好。 确定性等于96.875%确定它是素数( 1 - .5^certainty

我意识到这是一个非常古老的问题,但是最近我最终find了同样的东西,这里的链接并不是我要找的东西(尽可能限制在functiontypes上,懒惰地生成〜每个我想要的东西) 。

我偶然发现了一个不错的F#实现 ,所以所有的功劳都是他的。 我只是把它移植到Clojure上:

 (defn gen-primes "Generates an infinite, lazy sequence of prime numbers" [] (let [reinsert (fn [table x prime] (update-in table [(+ prime x)] conj prime))] (defn primes-step [table d] (if-let [factors (get table d)] (recur (reduce #(reinsert %1 d %2) (dissoc table d) factors) (inc d)) (lazy-seq (cons d (primes-step (assoc table (* dd) (list d)) (inc d)))))) (primes-step {} 2))) 

用法很简单

 (->> (gen-primes) (filter #(< % 2000000) ; do what you need with the stuff ) 

看到最后一个例子: http : //clojuredocs.org/clojure_core/clojure.core/lazy-seq

 ;; An example combining lazy sequences with higher order functions ;; Generate prime numbers using Eratosthenes Sieve ;; See http://en.wikipedia.org/wiki/Sieve_of_Eratosthenes ;; Note that the starting set of sieved numbers should be ;; the set of integers starting with 2 ie, (iterate inc 2) (defn sieve [s] (cons (first s) (lazy-seq (sieve (filter #(not= 0 (mod % (first s))) (rest s)))))) user=> (take 20 (sieve (iterate inc 2))) (2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71) 

晚会非常晚,但我会举一个例子,使用Java BitSets:

 (defn sieve [n] "Returns a BitSet with bits set for each prime up to n" (let [bs (new java.util.BitSet n)] (.flip bs 2 n) (doseq [i (range 4 n 2)] (.clear bs i)) (doseq [p (range 3 (Math/sqrt n))] (if (.get bs p) (doseq [q (range (* pp) n (* 2 p))] (.clear bs q)))) bs)) 

在2014年的Macbook Pro(2.3GHz的酷睿i7)上运行这个,我得到:

 user=> (time (do (sieve 1e6) nil)) "Elapsed time: 64.936 msecs" 

这是一个很好而简单的实现:

http://clj-me.blogspot.com/2008/06/primes.html

…但它是为Clojure的1.0版之前的版本编写的。 请参阅Clojure Contrib中的lazy_seqs ,以便与当前版本的语言一起使用。

 (defn sieve [[p & rst]] ;; make sure the stack size is sufficiently large! (lazy-seq (cons p (sieve (remove #(= 0 (mod % p)) rst))))) (def primes (sieve (iterate inc 2))) 

以10M堆栈大小,我在2.1Gz的macbook上得到了第331秒的第1001个素数。

所以我刚刚开始使用Clojure,是的,这个问题在欧拉计划上出现了很多,不是吗? 我写了一个非常快速的试用版本优化algorithm,但是在每个部门运行变得过于缓慢之前,它并没有真正地缩放太多。

所以我又开始了,这次使用筛选方法:

 (defn clense "Walks through the sieve and nils out multiples of step" [primes step i] (if (<= i (count primes)) (recur (assoc! primes i nil) step (+ i step)) primes)) (defn sieve-step "Only works if i is >= 3" [primes i] (if (< i (count primes)) (recur (if (nil? (primes i)) primes (clense primes (* 2 i) (* ii))) (+ 2 i)) primes)) (defn prime-sieve "Returns a lazy list of all primes smaller than x" [x] (drop 2 (filter (complement nil?) (persistent! (sieve-step (clense (transient (vec (range x))) 2 4) 3))))) 

用法和速度:

 user=> (time (do (prime-sieve 1E6) nil)) "Elapsed time: 930.881 msecs 

我对这个速度非常满意:它正在运行一个2009 MBP的REPL。 它大部分是快速的,因为我完全避开了惯用的Clojure,而是像猴子一样循环。 它的速度也快4倍,因为我使用瞬态vector在筛上工作而不是保持完全不变。

编辑:在Will Ness的一些build议/错误修复之后,它现在运行得更快了。

Scheme中有一个简单的筛子:

http://telegraphics.com.au/svn/puzzles/trunk/programming-in-scheme/primes-up-to.scm

这是一个高达10000的素数运行:

 #;1> (include "primes-up-to.scm") ; including primes-up-to.scm ... #;2> ,t (primes-up-to 10000) 0.238s CPU time, 0.062s GC time (major), 180013 mutations, 130/4758 GCs (major/minor) (2 3 5 7 11 13... 

基于威尔的评论,这里是我postponed-primes

 (defn postponed-primes-recursive ([] (concat (list 2 3 5 7) (lazy-seq (postponed-primes-recursive {} 3 9 (rest (rest (postponed-primes-recursive))) 9)))) ([D pq ps c] (letfn [(add-composites [D xs] (loop [ax] (if (contains? D a) (recur (+ as)) (persistent! (assoc! (transient D) as)))))] (loop [DD pp qq ps ps cc] (if (not (contains? D c)) (if (< cq) (cons c (lazy-seq (postponed-primes-recursive D pq ps (+ 2 c)))) (recur (add-composites D (+ c (* 2 p)) (* 2 p)) (first ps) (* (first ps) (first ps)) (rest ps) (+ c 2))) (let [s (get D c)] (recur (add-composites (persistent! (dissoc! (transient D) c)) (+ cs) s) p q ps (+ c 2)))))))) 

首次提交比较:

这里是我尝试将这个素数生成器从Python移植到Clojure。 下面返回一个无限延迟序列。

 (defn primes [] (letfn [(prime-help [foo bar] (loop [D foo q bar] (if (nil? (get D q)) (cons q (lazy-seq (prime-help (persistent! (assoc! (transient D) (* qq) (list q))) (inc q)))) (let [factors-of-q (get D q) key-val (interleave (map #(+ % q) factors-of-q) (map #(cons % (get D (+ % q) (list))) factors-of-q))] (recur (persistent! (dissoc! (apply assoc! (transient D) key-val) q)) (inc q))))))] (prime-help {} 2))) 

用法:

 user=> (first (primes)) 2 user=> (second (primes)) 3 user=> (nth (primes) 100) 547 user=> (take 5 (primes)) (2 3 5 7 11) user=> (time (nth (primes) 10000)) "Elapsed time: 409.052221 msecs" 104743 

编辑:

在性能比较中, postponed-primes使用目前为止看到的素数队列,而不是recursion调用postponed-primes

 user=> (def counts (list 200000 400000 600000 800000)) #'user/counts user=> (map #(time (nth (postponed-primes) %)) counts) ("Elapsed time: 1822.882 msecs" "Elapsed time: 3985.299 msecs" "Elapsed time: 6916.98 msecs" "Elapsed time: 8710.791 msecs" 2750161 5800139 8960467 12195263) user=> (map #(time (nth (postponed-primes-recursive) %)) counts) ("Elapsed time: 1776.843 msecs" "Elapsed time: 3874.125 msecs" "Elapsed time: 6092.79 msecs" "Elapsed time: 8453.017 msecs" 2750161 5800139 8960467 12195263) 

来自: http : //steloflute.tistory.com/entry/Clojure-%ED%94%84%EB%A1%9C%EA%B7%B8%EB%9E%A8-%EC%B5%9C%EC% A0%81%ED%99%94

使用Java数组

 (defmacro loopwhile [init-symbol init whilep step & body] `(loop [~init-symbol ~init] (when ~whilep ~@body (recur (+ ~init-symbol ~step))))) (defn primesUnderb [limit] (let [p (boolean-array limit true)] (loopwhile i 2 (< i (Math/sqrt limit)) 1 (when (aget pi) (loopwhile j (* i 2) (< j limit) i (aset pj false)))) (filter #(aget p %) (range 2 limit)))) 

用法和速度:

 user=> (time (def p (primesUnderb 1e6))) "Elapsed time: 104.065891 msecs" 

我刚开始使用Clojure,所以我不知道它是否好,但这里是我的解决scheme:

 (defn divides? [xi] (zero? (mod xi))) (defn factors [x] (flatten (map #(list % (/ x %)) (filter #(divides? x %) (range 1 (inc (Math/floor (Math/sqrt x)))))))) (defn prime? [x] (empty? (filter #(and divides? (not= x %) (not= 1 %)) (factors x)))) (def primes (filter prime? (range 2 java.lang.Integer/MAX_VALUE))) (defn sum-of-primes-below [n] (reduce + (take-while #(< % n) primes))) 

在来到这个线索并寻找更快的替代scheme之后,我感到惊讶的是没有人与Christophe Grand的以下文章联系起来:

 (defn primes3 [max] (let [enqueue (fn [sieve n factor] (let [m (+ n (+ factor factor))] (if (sieve m) (recur sieve m factor) (assoc sieve m factor)))) next-sieve (fn [sieve candidate] (if-let [factor (sieve candidate)] (-> sieve (dissoc candidate) (enqueue candidate factor)) (enqueue sieve candidate candidate)))] (cons 2 (vals (reduce next-sieve {} (range 3 max 2)))))) 

以及一个懒惰的版本:

 (defn lazy-primes3 [] (letfn [(enqueue [sieve n step] (let [m (+ n step)] (if (sieve m) (recur sieve m step) (assoc sieve m step)))) (next-sieve [sieve candidate] (if-let [step (sieve candidate)] (-> sieve (dissoc candidate) (enqueue candidate step)) (enqueue sieve candidate (+ candidate candidate)))) (next-primes [sieve candidate] (if (sieve candidate) (recur (next-sieve sieve candidate) (+ candidate 2)) (cons candidate (lazy-seq (next-primes (next-sieve sieve candidate) (+ candidate 2))))))] (cons 2 (lazy-seq (next-primes {} 3)))))