US-ASCII Malformed Encoding XSS Filter - Attack Detected - false alarm while using SUMMERNOTE with Polish characters - mod-security

I use summernote to put some formatted text into a database.
When I put a text without Polish characters or with just a few, it works fine, but when I use more Polish characters I got an error message from php script [HTTP/1.1 403 Forbidden 24ms].
I reviewed /var/log/httpd/modsec_audit.log file and I found these warnings/errors:
Message: Warning. Pattern match
"\xbc[^\xbe>][\xbe>]|<[^\xbe]\xbe" at ARGS:opis. [file
"/etc/httpd/modsecurity.d/activated_rules/REQUEST-941-APPLICATION-ATTACK-XSS.conf"]
[line "546"] [id "941310"] [msg "US-ASCII Malformed Encoding XSS
Filter - Attack Detected"] [data "Matched Data: \xbcno\xc5\x9bci od
tego jak doprawiony jest bulion. zup\xc4\x99 ju\xc5\xbc na talerzu
mieszamy z oliwa. found within ARGS:opis: warzywa zalewamy
bulionem warzywnym. gotujemy do mi\xc4\x99kko\xc5\x9bci, miksujemy,
solimy w zale\xc5\xbcno\xc5\x9bci od tego jak doprawiony jest bulion.
zup\xc4\x99 ju\xc5\xbc na talerzu mieszamy z oliwa. "] [severity
"CRITICAL"] [ver "OWASP_CRS/3.3.0"] [tag "application-multi"] [tag
"language-multi"] [tag "platform-tomcat"] [tag "attack-xss"] [tag
"paranoia-level/1"] [tag "OWASP_CRS"] [tag "capec/1000/152/242"]
Message: Access denied with code 403 (phase 2). Operator GE matched 5
at TX:anomaly_score. [file
"/etc/httpd/modsecurity.d/activated_rules/REQUEST-949-BLOCKING-EVALUATION.conf"]
[line "93"] [id "949110"] [msg "Inbound Anomaly Score Exceeded (Total
Score: 5)"] [severity "CRITICAL"] [ver "OWASP_CRS/3.3.0"] [tag
"application-multi"] [tag "language-multi"] [tag "platform-multi"]
[tag "attack-generic"] Message: Warning. Operator GE matched 5 at
TX:inbound_anomaly_score. [file
"/etc/httpd/modsecurity.d/activated_rules/RESPONSE-980-CORRELATION.conf"]
[line "91"] [id "980130"] [msg "Inbound Anomaly Score Exceeded (Total
Inbound Score: 5 -
SQLI=0,XSS=5,RFI=0,LFI=0,RCE=0,PHPI=0,HTTP=0,SESS=0): individual
paranoia level scores: 5, 0, 0, 0"] [ver "OWASP_CRS/3.3.0"] [tag
"event-correlation"] Apache-Error: [file "apache2_util.c"] [line 273]
[level 3] [client 192.168.101.12] ModSecurity: Warning. Pattern match
"\\\\xbc[^\\\\xbe>][\\\\xbe>]|<[^\\\\xbe]\\\\xbe"
at ARGS:opis. [file
"/etc/httpd/modsecurity.d/activated_rules/REQUEST-941-APPLICATION-ATTACK-XSS.conf"]
[line "546"] [id "941310"] [msg "US-ASCII Malformed Encoding XSS
Filter - Attack Detected"] [data "Matched Data:
\\xbcno\\xc5\\x9bci od tego jak doprawiony jest bulion.
zup\\xc4\\x99 ju\\xc5\\xbc na talerzu mieszamy z oliwa.
found within ARGS:opis: warzywa zalewamy bulionem warzywnym. gotujemy
do mi\\xc4\\x99kko\\xc5\\x9bci, miksujemy, solimy w
zale\\xc5\\xbcno\\xc5\\x9bci od tego jak doprawiony jest
bulion. zup\\xc4\\x99 ju\\xc5\\xbc na talerzu mieszamy z
oliwa. "] [severity "CRITICAL"] [ver "OWASP_CRS/3.3.0"] [tag
"application-multi"] [tag "language-multi"] [tag "platform-tomcat"]
[tag "attack-xss"] [tag "paranoia-level/1"] [tag "OWASP_CRS"] [tag
"capec/1000/152/242"] [hostname "somehostname.pl"] [uri
"/somescript.php"] [unique_id
"Y0peIxMWSyQpE#SPNKJmngAAAAs"]
There are more messages...
As I understand this is some kind of false alarm from apache2 mod_security.
Can you please advice on how to solve this issue and keep mod_security ON in the same time (if possible).
Thanks
UPDATE:
I have identified one polish character 'ż' (utf-8 \xc5\xbc) that causes the issue.
Other characters work OK.
Log message looks like this:
Message: Warning. Pattern match
"\xbc[^\xbe>][\xbe>]|<[^\xbe]\xbe" at ARGS:opis. [file
"/etc/httpd/modsecurity.d/activated_rules/REQUEST-941-APPLICATION-ATTACK-XSS.conf"]
[line "546"] [id "941310"] [msg "US-ASCII Malformed Encoding XSS
Filter - Attack Detected"] [data "Matched Data: \xbc found within
ARGS:opis: \xc5\xbc"] [severity "CRITICAL"] [ver
"OWASP_CRS/3.3.0"] [tag "application-multi"] [tag "language-multi"]
[tag "platform-tomcat"] [tag "attack-xss"] [tag "paranoia-level/1"]
[tag "OWASP_CRS"] [tag "capec/1000/152/242"] Message: Access denied
with code 403 (phase 2). Operator GE matched 5 at TX:anomaly_score.
[file
"/etc/httpd/modsecurity.d/activated_rules/REQUEST-949-BLOCKING-EVALUATION.conf"]
[line "93"] [id "949110"] [msg "Inbound Anomaly Score Exceeded (Total
Score: 5)"] [severity "CRITICAL"] [ver "OWASP_CRS/3.3.0"] [tag
"application-multi"] [tag "language-multi"] [tag "platform-multi"]
[tag "attack-generic"] Message: Warning. Operator GE matched 5 at
TX:inbound_anomaly_score. [file
"/etc/httpd/modsecurity.d/activated_rules/RESPONSE-980-CORRELATION.conf"]
[line "91"] [id "980130"] [msg "Inbound Anomaly Score Exceeded (Total
Inbound Score: 5 -
SQLI=0,XSS=5,RFI=0,LFI=0,RCE=0,PHPI=0,HTTP=0,SESS=0): individual
paranoia level scores: 5, 0, 0, 0"] [ver "OWASP_CRS/3.3.0"] [tag
"event-correlation"] Apache-Error: [file "apache2_util.c"] [line 273]
[level 3] [client 192.168.101.12] ModSecurity: Warning. Pattern match
"\\\\xbc[^\\\\xbe>][\\\\xbe>]|<[^\\\\xbe]\\\\xbe"
at ARGS:opis. [file
"/etc/httpd/modsecurity.d/activated_rules/REQUEST-941-APPLICATION-ATTACK-XSS.conf"]
[line "546"] [id "941310"] [msg "US-ASCII Malformed Encoding XSS
Filter - Attack Detected"] [data "Matched Data: \\xbc found
within ARGS:opis: \\xc5\\xbc"] [severity "CRITICAL"] [ver
"OWASP_CRS/3.3.0"] [tag "application-multi"] [tag "language-multi"]
[tag "platform-tomcat"] [tag "attack-xss"] [tag "paranoia-level/1"]
[tag "OWASP_CRS"] [tag "capec/1000/152/242"] [hostname "somehost.pl"]
[uri "/somescript.php"] [unique_id
"Y0qHleWp5HLoLWSFDV218QAAAEU"]
UPDATE2
I have reported the issue and found similar issue. As far as I can understand solution will be available in 3.4/dev branch and for current systems some solution can be downloaded from here.
I do not have much experience with installing/updating CRS, so could anybody advice please on how to install currently available solution on the top of CRS 3.3.0.

Related

Stop CRS rule from triggering for a given argument

I am an absolute newcomer to OWASP ModSecurity, so please excuse me if this is a simple question! Currently, for an image upload function, I get a bunch of "warnings" like the following:
ModSecurity: Warning. Matched "Operator Rx' with parameter (?i)\s\S\b' against variable ARGS:json.ImageBytes' (Value:  (474171 characters omitted)' ) [file "/etc/modsecurity.d/owasp-crs/rules/REQUEST-941-APPLICATION-ATTACK-XSS.conf"] [line "139"] [id "941130"] [rev "2"] [msg "XSS Filter - Category 3: Attribute Vector"] [data "Matched Data: ;base64 found within ARGS:json.ImageBytes:  (474141 characters omitted)"] [severity "2"] [ver "OWASP_CRS/3.0.0"] [maturity "1"] [accuracy "8"] [tag "application-multi"] [tag "language-multi"] [tag "platform-multi"] [tag "attack-xss"] [tag "OWASP_CRS/WEB_ATTACK/XSS"] [tag "WASCTC/WASC-8"] [tag "WASCTC/WASC-22"] [tag "OWASP_TOP_10/A3"] [tag "OWASP_AppSensor/IE1"] [tag "CAPEC-242"] [hostname "XX.XXX.X.XX"] [uri "/emps/api/emps/UpdateImage"] [unique_id "160217346360.547876"] [ref "o15,7v29,474271t:utf8toUnicode,t:urlDecodeUni,t:htmlEntityDecode,t:jsDecode,t:cssDecode,t:removeNulls"]
I need to prevent rule 941130 from triggering in the case that the argument (ARGS) is "json.ImageBytes".
I dont want to completely exclude the rule, but i am trying to get ModSecurity to ignore in the case of the specified parameter.
Maybe also - is there a way to do this by the URI ("/emps/api/emps/UpdateImage")?
So far i have tried:
SecRuleUpdateTargetById 941130 !ARGS:json.ImageBytes
but to no avail.
I would be very thankful for any help!
Based on the given information, you can create an exclusion rule - just put in in the REQUEST-900-EXCLUSION-RULES-BEFORE-CRS.conf.
The rule like like this:
SecRule REQUEST_URI "#beginsWith /emps/api/emps/UpdateImage" \
"id:9000901,\
phase:1,\
t:none,\
nolog,\
pass,\
ctl:ruleRemoveTargetById=941130;ARGS:json.ImageBytes"
or something similar.

clojure core.tools.cli: How to override boolean option?

I want a command that takes arguments which look like this:
--enable-boolean-flag --disable-boolean-flag --enable-boolean-flag
In the :options key returned by clojure.tools.cli/parse-opts, I want to have the :boolean-flag option set to true if the --enable-boolean-flag option came last on the command line and false if --disable-boolean-flag came last on the command line, if that makes any sense.
Any ideas?
EDIT: I'm using 0.3.6 of the core.tools.cli library.
You can achieve this by taking advantage of :id, :default, and :assoc-fn properties that tools-cli lets you specify for each command line option.
Use :id to set the same id for "--enable" and "--disable" options
Use :default on one of the options to specify what you want to happen if neither "--enable" or "--disable" are specified
Use :assoc-fn to specify what effect the option has on the options map. You want the value set to false every time "--disable" appears and to true every time --enable appears.
Putting it all together:
(ns clis.core
(:require [clojure.tools.cli :refer [parse-opts]])
(:gen-class))
(def cli-options
[["-e" "--enable" "Enable"
:default true
:id :boolean-flag
:assoc-fn (fn [m k _] (assoc m k true))]
["-d" "--disable" "Disable"
:id :boolean-flag
:assoc-fn (fn [m k _] (assoc m k false))]])
(defn -main [& args]
(parse-opts args cli-options))
Testing at the REPL:
(-main)
;; {:options {:boolean-flag true}, :arguments [], :summary " -e, --enable Enable\n -d, --disable Disable", :errors nil}
(-main "-e" "-d" "-e")
;; {:options {:boolean-flag true}, :arguments [], :summary " -e, --enable Enable\n -d, --disable Disable", :errors nil}
(-main "-e" "-d" "-e" "-d")
;; {:options {:boolean-flag false}, :arguments [], :summary " -e, --enable Enable\n -d, --disable Disable", :errors nil}

Logging to two files in Timbre

I'm trying to log to two different files from the same namespace with Timbre. Or if that's not possible, at least to different files from the two different namespaces.
Inspecting timbre/*config* I get the impression that I'd need two configuration maps to configure something like that. I can create another config map and use it with timbre/log* in place of the standard config map but I can't shake off the feeling that it's not how this is supposed to be used...?
(timbre/log* timbre/*config* :info "Test with standard config")
AFAIK, the easiest way is indeed to create two config maps:
(def config1
{:level :debug
:appenders {:spit1 (appenders/spit-appender {:fname "file1.log"})}})
(def config2
{:level :debug
:appenders {:spit2 (appenders/spit-appender {:fname "file2.log"})}})
(timbre/with-config config1
(info "This will print in file1") )
(timbre/with-config config2
(info "This will print in file2") )
A second way would be to write your own appender from the spit-appender:
https://github.com/ptaoussanis/timbre/blob/master/src/taoensso/timbre/appenders/core.cljx
(defn my-spit-appender
"Returns a simple `spit` file appender for Clojure."
[& [{:keys [fname] :or {fname "./timbre-spit.log"}}]]
{:enabled? true
:async? false
:min-level nil
:rate-limit nil
:output-fn :inherit
:fn
(fn self [data]
(let [{:keys [output_]} data]
(try
;; SOME LOGIC HERE TO CHOOSE THE FILE TO OUTPUT TO ...
(spit fname (str (force output_) "\n") :append true)
(catch java.io.IOException e
(if (:__spit-appender/retry? data)
(throw e) ; Unexpected error
(let [_ (have? enc/nblank-str? fname)
file (java.io.File. ^String fname)
dir (.getParentFile (.getCanonicalFile file))]
(when-not (.exists dir) (.mkdirs dir))
(self (assoc data :__spit-appender/retry? true))))))))})

Generate and stream a zip-file in a Ring web app in Clojure

I have a Ring handler that needs to:
Zip a few files
Stream the Zip to the client.
Now I have it sort of working, but only the first zipped entry gets streamed, and after that it stalls/stops. I feel it has something to do with flushing/streaming that is wrong.
Here is my (compojure) handler:
(GET "/zip" {:as request}
:query-params [order-id :- s/Any]
(stream-lessons-zip (read-string order-id) (:db request) (:auth-user request)))
Here is the stream-lessons-zip function:
(defn stream-lessons-zip
[]
(let [lessons ...];... not shown
{:status 200
:headers {"Content-Type" "application/zip, application/octet-stream"
"Content-Disposition" (str "attachment; filename=\"files.zip\"")
:body (futil/zip-lessons lessons)}))
And i use a piped-input-stream to do the streaming like so:
(defn zip-lessons
"Returns an inputstream (piped-input-stream) to be used directly in Ring HTTP responses"
[lessons]
(let [paths (map #(select-keys % [:file_path :file_name]) lessons)]
(ring-io/piped-input-stream
(fn [output-stream]
; build a zip-output-stream from a normal output-stream
(with-open [zip-output-stream (ZipOutputStream. output-stream)]
(doseq [{:keys [file_path file_name] :as p} paths]
(let [f (cio/file file_path)]
(.putNextEntry zip-output-stream (ZipEntry. file_name))
(cio/copy f zip-output-stream)
(.closeEntry zip-output-stream))))))))
So I have confirmed that the 'lessons' vector contains like 4 entries, but the zip file only contains 1 entry. Furthermore, Chrome doesn't seem to 'finalize' the download, ie. it thinks it is still downloading.
How can I fix this?
It sounds like producing a stateful stream using blocking IO is not supported by http-kit. Non-stateful streams can be done this way:
http://www.http-kit.org/server.html#async
A PR to introduce stateful streams using blocking IO was not accepted:
https://github.com/http-kit/http-kit/pull/181
It sounds like the option to explore is to use a ByteArrayOutputStream to fully render the zip file to memory, and then return the buffer that produces. If this endpoint isn't highly trafficked and the zip file it produces is not large (< 1 gb) then this might work.
So, it's been a few years, but that code still runs in production (ie. it works). So I made it work back then, but forgot to mention it here (and forgot WHY it works, to be honest,.. it was very much trial/error).
This is the code now:
(defn zip-lessons
"Returns an inputstream (piped-input-stream) to be used directly in Ring HTTP responses"
[lessons {:keys [firstname surname order_favorite_name company_name] :as annotation
:or {order_favorite_name ""
company_name ""
firstname ""
surname ""}}]
(debug "zipping lessons" (count lessons))
(let [paths (map #(select-keys % [:file_path :file_name :folder_number]) lessons)]
(ring-io/piped-input-stream
(fn [output-stream]
; build a zip-output-stream from a normal output-stream
(with-open [zip-output-stream (ZipOutputStream. output-stream)]
(doseq [{:keys [file_path file_name folder_number] :as p} paths]
(let [f (cio/as-file file_path)
baos (ByteArrayOutputStream.)]
(if (.exists f)
(do
(debug "Adding entry to zip:" file_name "at" file_path)
(let [zip-entry (ZipEntry. (str (if folder_number (str folder_number "/") "") file_name))]
(.putNextEntry zip-output-stream zip-entry)
(.close baos)
(.writeTo baos zip-output-stream)
(.closeEntry zip-output-stream)
(.flush zip-output-stream)
(debug "flushed")))
(warn "File '" file_name "' at '" file_path "' does not exist, not adding to zip file!"))))
(.flush zip-output-stream)
(.flush output-stream)
(.finish zip-output-stream)
(.close zip-output-stream))))))

Clustering (fkmeans) with Mahout using Clojure

I am trying to write a short script to cluster my data via clojure (calling Mahout classes though). I have my input data in this format (which is an output from a php script)
format: (tag) (image) (frequency)
tag_sit image_a 0
tag_sit image_b 1
tag_lorem image_a 1
tag_lorem image_b 0
tag_dolor image_a 0
tag_dolor image_b 1
tag_ipsum image_a 1
tag_ipsum image_b 1
tag_amit image_a 1
tag_amit image_b 0
... (more)
Then I write them into a SequenceFile using this script (clojure)
#!./bin/clj
(ns sensei.sequence.core)
(require 'clojure.string)
(require 'clojure.java.io)
(import org.apache.hadoop.conf.Configuration)
(import org.apache.hadoop.fs.FileSystem)
(import org.apache.hadoop.fs.Path)
(import org.apache.hadoop.io.SequenceFile)
(import org.apache.hadoop.io.Text)
(import org.apache.mahout.math.VectorWritable)
(import org.apache.mahout.math.SequentialAccessSparseVector)
(with-open [reader (clojure.java.io/reader *in*)]
(let [hadoop_configuration ((fn []
(let [conf (new Configuration)]
(. conf set "fs.default.name" "hdfs://localhost:9000/")
conf)))
hadoop_fs (FileSystem/get hadoop_configuration)]
(reduce
(fn [writer [index value]]
(. writer append index value)
writer)
(SequenceFile/createWriter
hadoop_fs
hadoop_configuration
(new Path "test/sensei")
Text
VectorWritable)
(map
(fn [[tag row_vector]]
(let [input_index (new Text tag)
input_vector (new VectorWritable)]
(. input_vector set row_vector)
[input_index input_vector]))
(map
(fn [[tag photo_list]]
(let [photo_map (apply hash-map photo_list)
input_vector (new SequentialAccessSparseVector (count (vals photo_map)))]
(loop [frequency_list (vals photo_map)]
(if (zero? (count frequency_list))
[tag input_vector]
(when-not (zero? (count frequency_list))
(. input_vector set
(mod (count frequency_list) (count (vals photo_map)))
(Integer/parseInt (first frequency_list)))
(recur (rest frequency_list)))))))
(reduce
(fn [result next_line]
(let [[tag photo frequency] (clojure.string/split next_line #" ")]
(update-in result [tag]
#(if (nil? %)
[photo frequency]
(conj % photo frequency)))))
{}
(line-seq reader)))))))
Basically it turns the input into sequence file, in this format
key (Text): $tag_uri
value (VectorWritable): a vector (cardinality = number of documents) with numeric index and the respective frequency <0:1 1:0 2:0 3:1 4:0 ...>
Then I proceed to do the actual cluster with this script (by referring to this blog post)
#!./bin/clj
(ns sensei.clustering.fkmeans)
(import org.apache.hadoop.conf.Configuration)
(import org.apache.hadoop.fs.Path)
(import org.apache.mahout.clustering.fuzzykmeans.FuzzyKMeansDriver)
(import org.apache.mahout.common.distance.EuclideanDistanceMeasure)
(import org.apache.mahout.clustering.kmeans.RandomSeedGenerator)
(let [hadoop_configuration ((fn []
(let [conf (new Configuration)]
(. conf set "fs.default.name" "hdfs://127.0.0.1:9000/")
conf)))
input_path (new Path "test/sensei")
output_path (new Path "test/clusters")
clusters_in_path (new Path "test/clusters/cluster-0")]
(FuzzyKMeansDriver/run
hadoop_configuration
input_path
(RandomSeedGenerator/buildRandom
hadoop_configuration
input_path
clusters_in_path
(int 2)
(new EuclideanDistanceMeasure))
output_path
(new EuclideanDistanceMeasure)
(double 0.5)
(int 10)
(float 5.0)
true
false
(double 0.0)
false)) '' runSequential
However I am getting output like this
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
11/08/25 15:20:16 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
11/08/25 15:20:16 INFO compress.CodecPool: Got brand-new compressor
11/08/25 15:20:16 INFO compress.CodecPool: Got brand-new decompressor
11/08/25 15:20:17 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
11/08/25 15:20:17 INFO input.FileInputFormat: Total input paths to process : 1
11/08/25 15:20:17 INFO mapred.JobClient: Running job: job_local_0001
11/08/25 15:20:17 INFO mapred.MapTask: io.sort.mb = 100
11/08/25 15:20:17 INFO mapred.MapTask: data buffer = 79691776/99614720
11/08/25 15:20:17 INFO mapred.MapTask: record buffer = 262144/327680
11/08/25 15:20:17 WARN mapred.LocalJobRunner: job_local_0001
java.lang.IllegalStateException: No clusters found. Check your -c path.
at org.apache.mahout.clustering.fuzzykmeans.FuzzyKMeansMapper.setup(FuzzyKMeansMapper.java:62)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:210)
11/08/25 15:20:18 INFO mapred.JobClient: map 0% reduce 0%
11/08/25 15:20:18 INFO mapred.JobClient: Job complete: job_local_0001
11/08/25 15:20:18 INFO mapred.JobClient: Counters: 0
Exception in thread "main" java.lang.RuntimeException: java.lang.InterruptedException: Fuzzy K-Means Iteration failed processing test/clusters/cluster-0/part-randomSeed
at clojure.lang.Util.runtimeException(Util.java:153)
at clojure.lang.Compiler.eval(Compiler.java:6417)
at clojure.lang.Compiler.load(Compiler.java:6843)
at clojure.lang.Compiler.loadFile(Compiler.java:6804)
at clojure.main$load_script.invoke(main.clj:282)
at clojure.main$script_opt.invoke(main.clj:342)
at clojure.main$main.doInvoke(main.clj:426)
at clojure.lang.RestFn.invoke(RestFn.java:436)
at clojure.lang.Var.invoke(Var.java:409)
at clojure.lang.AFn.applyToHelper(AFn.java:167)
at clojure.lang.Var.applyTo(Var.java:518)
at clojure.main.main(main.java:37)
Caused by: java.lang.InterruptedException: Fuzzy K-Means Iteration failed processing test/clusters/cluster-0/part-randomSeed
at org.apache.mahout.clustering.fuzzykmeans.FuzzyKMeansDriver.runIteration(FuzzyKMeansDriver.java:252)
at org.apache.mahout.clustering.fuzzykmeans.FuzzyKMeansDriver.buildClustersMR(FuzzyKMeansDriver.java:421)
at org.apache.mahout.clustering.fuzzykmeans.FuzzyKMeansDriver.buildClusters(FuzzyKMeansDriver.java:345)
at org.apache.mahout.clustering.fuzzykmeans.FuzzyKMeansDriver.run(FuzzyKMeansDriver.java:295)
at sensei.clustering.fkmeans$eval17.invoke(fkmeans.clj:35)
at clojure.lang.Compiler.eval(Compiler.java:6406)
... 10 more
When runSequential is set to true
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
11/09/07 14:32:32 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
11/09/07 14:32:32 INFO compress.CodecPool: Got brand-new compressor
11/09/07 14:32:32 INFO compress.CodecPool: Got brand-new decompressor
Exception in thread "main" java.lang.IllegalStateException: Clusters is empty!
at org.apache.mahout.clustering.fuzzykmeans.FuzzyKMeansDriver.buildClustersSeq(FuzzyKMeansDriver.java:361)
at org.apache.mahout.clustering.fuzzykmeans.FuzzyKMeansDriver.buildClusters(FuzzyKMeansDriver.java:343)
at org.apache.mahout.clustering.fuzzykmeans.FuzzyKMeansDriver.run(FuzzyKMeansDriver.java:295)
at sensei.clustering.fkmeans$eval17.invoke(fkmeans.clj:35)
at clojure.lang.Compiler.eval(Compiler.java:6465)
at clojure.lang.Compiler.load(Compiler.java:6902)
at clojure.lang.Compiler.loadFile(Compiler.java:6863)
at clojure.main$load_script.invoke(main.clj:282)
at clojure.main$script_opt.invoke(main.clj:342)
at clojure.main$main.doInvoke(main.clj:426)
at clojure.lang.RestFn.invoke(RestFn.java:436)
at clojure.lang.Var.invoke(Var.java:409)
at clojure.lang.AFn.applyToHelper(AFn.java:167)
at clojure.lang.Var.applyTo(Var.java:518)
at clojure.main.main(main.java:37)
I have also rewritten the fkmeans script to this form
#!./bin/clj
(ns sensei.clustering.fkmeans)
(import org.apache.hadoop.conf.Configuration)
(import org.apache.hadoop.fs.Path)
(import org.apache.mahout.clustering.fuzzykmeans.FuzzyKMeansDriver)
(import org.apache.mahout.common.distance.EuclideanDistanceMeasure)
(import org.apache.mahout.clustering.kmeans.RandomSeedGenerator)
(let [hadoop_configuration ((fn []
(let [conf (new Configuration)]
(. conf set "fs.default.name" "hdfs://localhost:9000/")
conf)))
driver (new FuzzyKMeansDriver)]
(. driver setConf hadoop_configuration)
(. driver
run
(into-array String ["--input" "test/sensei"
"--output" "test/clusters"
"--clusters" "test/clusters/clusters-0"
"--clustering"
"--overwrite"
"--emitMostLikely" "false"
"--numClusters" "3"
"--maxIter" "10"
"--m" "5"])))
but is still getting same error as the first initial version :/
Command Line tool runs fine
$ bin/mahout fkmeans --input test/sensei --output test/clusters --clusters test/clusters/clusters-0 --clustering --overwrite --emitMostLikely false --numClusters 10 --maxIter 10 --m 5
However it would not return the points when I try clusterdumper even though --clustering option exists in the previous command and --pointsDir is defined here
$ ./bin/mahout clusterdump --seqFileDir test/clusters/clusters-1 --pointsDir test/clusters/clusteredPoints --output sensei.txt
Mahout version used: 0.6-snapshot, clojure 1.3.0-snapshot
Please let me know if I miss out anything
My guess is that the Mahout implementation of fuzzy-c-means needs initial clusters to start with, which you maybe did not supply?
Also it sounds a bit as if you are running single-node? Note that for single-node systems you should avoid all the Mahout/Hadoop overhead and just use a regular clustering algorithm. Hadoop/Mahout comes at quite a cost that only pays off when you can no longer process the data on a single system. It is not "map reduce" unless you do that on a large number of systems.