Riemann: triggering alerts with changed-state - clojure

I'm new to Riemann and clojure. I'm trying to configure alerts based on changed states. But the states never seem to be updated/indexed. So when I get to the changed-state block, state is nil. I can add the alerts within the splitp block, but that seems redundant. Maybe we will want different types of notifications between critical and warnings, but for now, I'd like to see this work (if possible).
(let [index (default :ttl 20 (index))]
(streams
index
(where (not (state "expired"))
(where (service "load/load/shortterm")
(splitp < metric
0.05 (with :state "critical" index)
0.02 (with :state "warning" index)
(with :state "ok" index))
)
#(info %)
(changed-state {:init "ok"}
(stable 60 :state
#(info "CHANGED STATE" %)
(email "user#host.com")))
)
)
)
Thanks for any help!
Riemann v0.2.9, collectd v5.5.0, OS CentOS 6.5

Related

Google BigQuery API get state of running job non-US location in Clojure

I have an app that reads CSV files and pushes it to BQ tables, checks for the status of that job before doing the next CSV file and so on. This was working fine while my datasets were in the US region, however we recently moved our datasets to Australia region and now I get
#error { :cause 404 Not Found { "code" : 404, "errors" : [ { "domain" : "global", "message" : "Not found: Job load-csv-job123", "reason" : "notFound" }
While I can run the job fine against this dataset but I cannot call the BQ get API in my Clojure code to get the status. While calling the insert job API I am setting the location in the jobReference
job-reference (doto (JobReference.) (.setLocation "australia-southeast1") (.setJobId job-id) )
and then call my insert like this
status (->> bq
(.jobs)
(#(.insert % project-id job-spec content))
(.execute)
(.getStatus))]
The status above works when I do (->> status (.getState)
I know I have to be setting the location somewhere for non-US/non-EU regions for the GET call on the job, but just can't figure how to from the Google Docs using the GET API.
https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/get
The API/jar I am using in the below code
[com.google.apis/google-api-services-bigquery "v2-rev459-1.25.0"]
The code I have for getting status in a loop with recur
(loop [status status] ;; Waiting until successfully processed
(log/info job-id " : " (->> status (.getState)))
(if (= "DONE" (->> status (.getState)))
(do (log/info "Status seems done?")
(if-let [errors (.getErrors status)]
(do
(log/info "seems like we have errors")
(vec (map #(.getMessage %) errors)))
nil))
(do
(log/info "status is pending let's wait and check...job spec" job-spec)
(Thread/sleep 3000)
(recur (->> bq
(.jobs)
(#(.get % project-id job-id))
(.execute)
(.getStatus))
))))))
Can you tell what I am missing, my attempts to setLocation on the .get
(#(.get % project-id job-id))(.setLocation "australia-southeast1")
comes back with
CompilerException java.lang.IllegalArgumentException: No matching field found: setLocation for class java.lang.String, compiling:```
The thing that's missing here appears to be the details about which clojure library you're using. That's not a language supported by first party libraries, so this may come down to how the library is assembled, and whether it's maintained.
For a jobs get call, the thing that's needed is for the request to route correctly is to add a location URL param to the request, e.g. GET https://bigquery.googleapis.com/bigquery/v2/projects/yourprojectid/jobs/yourjobid?location=australia-southeast1
I managed to figure out the Clojure code for how to set location when dataset has moved from one region to another, this is need to get the status of the jobs that are already run. Note that I had to make sure in my case that the previous BQ job(insert into table) has finished before I run the next one. I loop through the status and recur on it till I get a DONE. Note that DONE here doesn't mean success of the job, just that the job has finished. Which is why I fetch the err vec below if there were any errors and return that.
I was using a threaded form ->> for this initially but couldnt figure how to setLocation that way, so used a plain form now. I will update with a threaded form later.
(loop [status status] ;; Waiting until successfully processed
(log/info job-id " : " (->> status (.getState)))
(if (= "DONE" (->> status (.getState)))
(do (if-let [errors (.getErrors status)]
(do
(log/debug "Errors in job: " job-id)
(vec (map #(.getMessage %) errors)))
nil))
(do
(Thread/sleep 3000)
(recur
(let [jobsobj (.jobs bq)
getobj (.get jobsobj project-id job-id)
_ (.setLocation getobj "australia-southeast1")
]
(.getStatus (.execute getobj)))
))))

Event count at certain time interval in riemann

I have to check the number of count appearing in an event at each interval of every 30 seconds. If the count is greater than 5 means, I need to trigger an email.
I am using the below code, but email didn't get triggered.
(let [userindex1 (default :ttl 300 (update-index (index)))]
(streams
prn
userindex1))
(streams
(where (and (service "system_log")
(not (expired? event)))
; fixed-time-window sends a vector of events out every 30 seconds
(fixed-time-window
30
; smap passes those events into a function
(smap
(fn [events]
;Calculate the no of count of events for failure
(let [numberofFailure (count (filter #(="IE" (:description %)) events))]
{:status "login failures"
:metric numberofFailure
:totalFail (boolean(numberofFailure > 5))}
(streams
prn
numberofFailure))))
;check if the variable status is true if condition satisfied then trigger an email
(let [email (mailer {:host "smtp.gmail.com"
:port 25
:user "aaaaa"
:pass "bbbbb"
:auth "true"
:subject (fn [events]
(clojure.string/join ", "
(map :service events)))
:from "abc#gmail.com"})]
(streams
(where (and (:status "login failures")
(:totalFail true))
(email "123#gmail.com")))))))
Where am I going wrong?
There are a couple of issues here. I'll try to address some of them, then post a minimal working example:
The first fn passed to smap should return an event. That event can be created with event or by assoc'ing into one of the received events. In your sample a plain map is created (which would not work, it's not a proper event), but that's even lost because then streams is called (which AFAIK should only be called at the top level). So instead of:
(smap
(fn [events]
(let [numberofFailure ...]
{:status "login failures"
:metric numberofFailure
:totalFail (boolean ...)}
(streams
prn
numberofFailure)))
...)
You should do something like:
(smap
(fn [events]
(let [numberofFailure ...]
(event {:status "login failures"
:metric numberofFailure
:totalFail (boolean ...)}))
...)
To calculate totalFail remember that you need to use prefix notation to call >, so it must be (> totalFail 5). And boolean is not needed, as > will already return a boolean.
I would initialize the mailer out of the top-level streams call, as an enclosing scope using let or with a def. But it should work as it is.
You should pass the last where as a children stream to smap, so it must be the second argument to smap. Let's recall the smap docs:
(smap f & children)
Streaming map. Calls children with (f event), whenever (f event) is non-nil.
Prefer this to (adjust f) and (combine f). Example:
(smap :metric prn) ; prints the metric of each event.
(smap #(assoc % :state "ok") index) ; Indexes each event with state "ok"
The last where should not be enclosed by streams, and the and sentence must work on the event, so it must be:
(where (and (= (:status event) "login failures")
(:total-fail event))
(email "123#gmail.com"))
The :subject fn for mailer should be passed as part of a second map, as explained in the mailer documentation
There's an open issue on fixed-time-window which makes it a bit unreliable: it doesn't fire as soon as the time window is due but waits until a new event is fired, so you might want to use a different windowing strategy until that get's fixed.
Here goes a full minimal working example based on yours:
(let [email (mailer {:host "localhost"
:port 1025
:from "abc#gmail.com"})]
(streams
(where (and (service "system_log")
(not (expired? event)))
(fixed-time-window
5
(smap
(fn [events]
(let [count-of-failures (count (filter #(= "IE" (:description %)) events))]
(event
{:status "login failures"
:metric count-of-failures
:total-fail (>= count-of-failures 2)})))
(where (and (= (:status event) "login failures")
(:total-fail event))
(email "hello123#gmail.com")))))))

Riemann - Build a stream dynamically from a map

I have the following function which gets a map with service name and threshold. It checks if the service crossed a defined threshold and then calls multiple downstream children on the event.
(defn tc
[s & children]
(where
(and (service (:service_name s)) (not (expired? event)))
(by [:host :service]
(where (> metric (:threshold s)
(with :state "critical"
(apply sdo children)))))))
I would like to build a stream dynamically using a vector of maps:
(def services [{:service "cpu/usage" :threshold 90}
{:service "memory/usage" :threshold 90}])
When trying to run it in a stream i'm getting the following warning:
(streams
(doseq [s services] (tc s prn)))
WARN [2015-01-05 14:27:07,187] Thread-15 - riemann.core - instrumentation service caught
java.lang.NullPointerException
at riemann.core$stream_BANG_$fn__11140.invoke(core.clj:19)
at riemann.core$stream_BANG_.invoke(core.clj:18)
at riemann.core$instrumentation_service$measure__11149.invoke(core.clj:57)
at riemann.service.ThreadService$thread_service_runner__8782$fn__8783.invoke(service.clj:66)
at riemann.service.ThreadService$thread_service_runner__8782.invoke(service.clj:65)
at clojure.lang.AFn.run(AFn.java:22)
at java.lang.Thread.run(Thread.java:701)
It works, if i run the streams function inside the doseq.
This one works and gives the following output:
(doseq [s services]
(streams (tc s prn)))
#riemann.codec.Event{:host "testhost", :service "memory/usage", :state "critical", :description nil, :metric 91.0, :tags nil, :time 1420460856, :ttl 60.0}
It seems to blow up if your events don't have all the required fields, here's a sample from a similar project where I build an event from a sequence of events (reducing) It's not exactly what you are doing though I'm generating events in the same way:
{:service (:service (first events))
:metric (->> events count)
:host "All-counts"
:state "OK"
:time (:time (last events))
:ttl default-interval}
I got NPE specifically when time was missing. If you can't inherit it form somewhere, just make it up (use now for instance) without a reasonable value here, event expiration will not work and you'll run out of RAM

custom config for riemann

I'm new to riemann and clojure as well. what is want is, when new event comes in, it checks its state and service, if both things got match it print some thing in console. here is my config file.
(let [index (default :ttl 300 (update-index (index)))]
; Inbound events will be passed to these streams:
(streams
; Index all events immediately.
index
; Calculate an overall rate of events.
(with {:metric 1 :host nil :state "ok" :service "events/sec"}
(rate 5 index))
; my code starts from here
; _______________________
(where (and (= :service "foo")
(= :state "ok"))
(println "It works"))
;________ENDS HERE ______________
; Log expired events.
(expired
(fn [event] (info "expired" event)))
))
when I start riemann, I can see "It Works" in console. but never afterwards.
tell me where i'm doing wrong.?
The problem looks to be your use of keywords in the where expression. The where function will re-write the conditions to use keywords internally, though this behaviour doesn't appear to be clearly stated outside the API documentation. If you look at the example in the howto the conditions in where expressions don't have colons on field names.
I have tested the following config:
(where (and (service "foo")
(state "ok"))
prn)

rxjava and clojure asynchrony mystery: futures promises and agents, oh my

I apologize in advance for the length of this note. I spent considerable time making it shorter, and this was as small as I could get it.
I have a mystery and would be grateful for your help. This mystery comes from the behavior of an rxjava observer I wrote in Clojure over a couple of straightforward observables cribbed from online samples.
One observable synchronously sends messages to the onNext handlers of its observers, and my supposedly principled observer behaves as expected.
The other observable asynchronously does the same, on another thread, via a Clojure future. The exact same observer does not capture all events posted to its onNext; it just seems to lose a random number of messages at the tail.
There is an intentional race in the following between the expiration of a wait for the promised onCompleted and the expiration of a wait for all events sent to an agent collector. If the promise wins, I expect to see false for onCompleted and a possibly short queue in the agent. If the agent wins, I expect to see true for onCompleted and all messages from the agent's queue. The one result I DO NOT expect is true for onCompleted AND a short queue from the agent. But, Murphy doesn't sleep, and that's exactly what I see. I don't know whether garbage-collection is at fault, or some internal queuing to Clojure's STM, or my stupidity, or something else altogether.
I present the source in the order of its self-contained form, here, so that it can be run directly via lein repl. There are three cermonials to get out of the way: first, the leiningen project file, project.clj, which declares dependency on the 0.9.0 version of Netflix's rxjava:
(defproject expt2 "0.1.0-SNAPSHOT"
:description "FIXME: write description"
:url "http://example.com/FIXME"
:license {:name "Eclipse Public License"
:url "http://www.eclipse.org/legal/epl-v10.html"}
:dependencies [[org.clojure/clojure "1.5.1"]
[com.netflix.rxjava/rxjava-clojure "0.9.0"]]
:main expt2.core)
Now, the namespace and a Clojure requirement and the Java imports:
(ns expt2.core
(:require clojure.pprint)
(:refer-clojure :exclude [distinct])
(:import [rx Observable subscriptions.Subscriptions]))
Finally, a macro for output to the console:
(defmacro pdump [x]
`(let [x# ~x]
(do (println "----------------")
(clojure.pprint/pprint '~x)
(println "~~>")
(clojure.pprint/pprint x#)
(println "----------------")
x#)))
Finally, to my observer. I use an agent to collect the messages sent by any observable's onNext. I use an atom to collect a potential onError. I use a promise for the onCompleted so that consumers external to the observer can wait on it.
(defn- subscribe-collectors [obl]
(let [;; Keep a sequence of all values sent:
onNextCollector (agent [])
;; Only need one value if the observable errors out:
onErrorCollector (atom nil)
;; Use a promise for 'completed' so we can wait for it on
;; another thread:
onCompletedCollector (promise)]
(letfn [;; When observable sends a value, relay it to our agent"
(collect-next [item] (send onNextCollector (fn [state] (conj state item))))
;; If observable errors out, just set our exception;
(collect-error [excp] (reset! onErrorCollector excp))
;; When observable completes, deliver on the promise:
(collect-completed [ ] (deliver onCompletedCollector true))
;; In all cases, report out the back end with this:
(report-collectors [ ]
(pdump
;; Wait for everything that has been sent to the agent
;; to drain (presumably internal message queues):
{:onNext (do (await-for 1000 onNextCollector)
;; Then produce the results:
#onNextCollector)
;; If we ever saw an error, here it is:
:onError #onErrorCollector
;; Wait at most 1 second for the promise to complete;
;; if it does not complete, then produce 'false'.
;; I expect if this times out before the agent
;; times out to see an 'onCompleted' of 'false'.
:onCompleted (deref onCompletedCollector 1000 false)
}))]
;; Recognize that the observable 'obl' may run on another thread:
(-> obl
(.subscribe collect-next collect-error collect-completed))
;; Therefore, produce results that wait, with timeouts, on both
;; the completion event and on the draining of the (presumed)
;; message queue to the agent.
(report-collectors))))
Now, here is a synchronous observable. It pumps 25 messages down the onNext throats of its observers, then calls their onCompleteds.
(defn- customObservableBlocking []
(Observable/create
(fn [observer] ; This is the 'subscribe' method.
;; Send 25 strings to the observer's onNext:
(doseq [x (range 25)]
(-> observer (.onNext (str "SynchedValue_" x))))
; After sending all values, complete the sequence:
(-> observer .onCompleted)
; return a NoOpSubsription since this blocks and thus
; can't be unsubscribed (disposed):
(Subscriptions/empty))))
We subscribe our observer to this observable:
;;; The value of the following is the list of all 25 events:
(-> (customObservableBlocking)
(subscribe-collectors))
It works as expected, and we see the following results on the console
{:onNext (do (await-for 1000 onNextCollector) #onNextCollector),
:onError #onErrorCollector,
:onCompleted (deref onCompletedCollector 1000 false)}
~~>
{:onNext
["SynchedValue_0"
"SynchedValue_1"
"SynchedValue_2"
"SynchedValue_3"
"SynchedValue_4"
"SynchedValue_5"
"SynchedValue_6"
"SynchedValue_7"
"SynchedValue_8"
"SynchedValue_9"
"SynchedValue_10"
"SynchedValue_11"
"SynchedValue_12"
"SynchedValue_13"
"SynchedValue_14"
"SynchedValue_15"
"SynchedValue_16"
"SynchedValue_17"
"SynchedValue_18"
"SynchedValue_19"
"SynchedValue_20"
"SynchedValue_21"
"SynchedValue_22"
"SynchedValue_23"
"SynchedValue_24"],
:onError nil,
:onCompleted true}
----------------
Here is an asynchronous observable that does exactly the same thing, only on a future's thread:
(defn- customObservableNonBlocking []
(Observable/create
(fn [observer] ; This is the 'subscribe' method
(let [f (future
;; On another thread, send 25 strings:
(doseq [x (range 25)]
(-> observer (.onNext (str "AsynchValue_" x))))
; After sending all values, complete the sequence:
(-> observer .onCompleted))]
; Return a disposable (unsubscribe) that cancels the future:
(Subscriptions/create #(future-cancel f))))))
;;; For unknown reasons, the following does not produce all 25 events:
(-> (customObservableNonBlocking)
(subscribe-collectors))
But, surprise, here is what we see on the console: true for onCompleted, implying that the promise DID NOT TIME-OUT; but only some of the asynch messages. The actual number of messages we see varies from run to run, implying that there is some concurrency phenomenon at play. Clues appreciated.
----------------
{:onNext (do (await-for 1000 onNextCollector) #onNextCollector),
:onError #onErrorCollector,
:onCompleted (deref onCompletedCollector 1000 false)}
~~>
{:onNext
["AsynchValue_0"
"AsynchValue_1"
"AsynchValue_2"
"AsynchValue_3"
"AsynchValue_4"
"AsynchValue_5"
"AsynchValue_6"],
:onError nil,
:onCompleted true}
----------------
The await-for on agent means Blocks the current thread until all actions dispatched thus
far (from this thread or agent) to the agents have occurred, which means that it may happen that after your await is over there is still some other thread that can send messages to the agent and that is what is happening in your case. After your await on agent is over and you have deref its value in the :onNext key in the map, then you wait for the on completed promise which turns out to be true after the wait but in the mean time some other messages were dispatched to the agent to be collected into the vector.
You can solve this by having the :onCompleted key as the first key in the map which basically means wait for the completion and then wait for the agents coz by that time there is no more send calls on the agent can happen after as have already received onCompleted.
{:onCompleted (deref onCompletedCollector 1000 false)
:onNext (do (await-for 0 onNextCollector)
#onNextCollector)
:onError #onErrorCollector
}