I am using ELK (logstash, ES, Kibana) stack for log analysis and Riemann for alerting. I have logs in which users is one of the fields parsed by logstash and I send the events to riemann from riemann output plugin.
Logstash parses logs and user is one of the field. Eg: logs parsed
Timestamp user command-name
2014-06-07... root sh ./scripts/abc.sh
2014-06-08... sid sh ./scripts/xyz.sh
2014-06-08... abc sh ./scripts/xyz.sh
2014-06-09... root sh ./scripts/xyz.sh
Logstash:
riemann {
riemann_event => {
"service" => "logins"
"unique_user" => "%{user}"
}
}
So users values will be like: root, sid, abc, root, sid, def, etc....
So I split stream by user i.e one stream for each unique user. Now, I want to alert when number of unique users count go more than 3. I wrote the following but it's not achieving my purpose.
Riemann:
(streams
(where (service "logins")
(by :unique_user
(moving-time-window 3600
(smap (fn [events]
(let
[users (count events)]
(if (> users 3)
(email "abc#gmail.com")
))))))))
I am new to Riemann and clojure. Any help is appreciated.
email returns a stream. Therefore, for it to work, you must either use it as a stream, by passing it as a parameter to another stream, or use call-rescue to send an event to it directly. Additionally, streams that are meant to receive events from multiple sources (such as your alert destination) should be created once, and stored in a variable for re-use.
First approach, using only abstract streams:
(let [alert (email "abc#gmail.com")]
(streams
(where (service "logins")
(by :unique_user
(moving-time-window 3600
(smap folds/count
(where (> metric 3) alert)))))))
Second approach, using call-rescue:
(let [alert (email "abc#gmail.com")]
(streams
(where (service "logins")
(by :unique_user
(moving-time-window 3600
(fn [events]
(when (> (count events) 3)
(call-rescue (last events) alert))))))))
Related
New to Emacs:
I am trying to populate a completion collection from a sql query:
All goes according to plan except the completion list is ends up as individual words rather than the complete list item as a string with spaces.
(defun test_sql_interactive2()
(interactive)
(setq look (emacsql db2 [:select [Accounts:acc_name]:from Accounts]))
(setq var (completing-read
"Complete an Account Name: "
look
nil t ))
(insert var))
the variable look is the result of the sql query which returns:
((Collective Retirement Account) (Stocks and Shares) (Current Account) (Savings Account))
but the emacs completing-read function sees this as 8 words to use as the collection to complete from rather than 4 strings, so instead of "Collective Retirement Account" being offered as a completion, it is only "Collective".
How can I have the completion return the entire string with spaces?
Change the contents of the alist (value of look) that you pass to completing-read to be strings:
(("Collective Retirement Account")
("Stocks and Shares")
("Current Account")
("Savings Account"))
This does that:
(let ((look '((Collective Retirement Account)
(Stocks and Shares)
(Current Account)
(Savings Account))))
(mapcar (lambda (xx) (list (substring (format "%s" xx) 1 -1)))
look))
I'm using ring-swagger via compojure-api. I have a few query parameters and I'm struggling to find a way to add a description to a single query parameter. I can add the summary of the entire endpoint but that's not enough.
Is it possible to add a swagger description to a single query parameter using ring-swagger/compojure-api?
compojure.api.sweet/describe.
For example:
(GET "/hello" []
:query-params [name :- (describe String "This is the swagger description for the parameter")]
(ok {:message (str "Hello, " name)}))
Are there any reasons not to use a wildcard pull?
(defn pull-wild
"Pulls all attributes of a single entity."
[db ent-id]
(d/pull db '[*] ent-id))
It's much more convenient than explicitly stating the attributes.
It depends on which attributes you need to have in your application and if it's data intensive or whether you want to pull lots of entities.
In case you use the client-library, you might want to minimize the data that needs to be send over the wire.
I guess there are lots of other thoughts about that.
But as long as it's fast enough I would pull the wildcard.
fricke
You may also be interested in the entity-map function from Tupelo Datomic. Given an EID (or a Lookup Ref) it will return the full record as a regular Clojure map:
(let [
; Retrieve James' attr-val pairs as a map. An entity can be referenced either by EID or by a
; LookupRef, which is a unique attribute-value pair expressed as a vector.
james-map (td/entity-map (live-db) james-eid) ; lookup by EID
james-map2 (td/entity-map (live-db) [:person/name "James Bond"] ) ; lookup by LookupRef
]
(is (= james-map james-map2
{:person/name "James Bond" :location "London" :weapon/type #{:weapon/wit :weapon/gun} } ))
tl;dr
How can I derive a keyword from a number in ClojureScript:
(keyword 22)
;;=> :22 but in fact returns nil.
Background
In my ClojureScript/Hoplon application I make HTTP requests via cljs-http. Parts of the response I receive look like this:
{:companies
{:22 {:description ... } ; A company.
:64 {:description ... }
... }
{:offers
[{:description ... } ; An offer.
{:description ... }
... ]
Each offer within the vector behind :offers has a :companyId which represents a key in :companies. As soon as I receive the response, I reset! a cell (similar to an atom) query.
Now, I'd like to iterate over each offer and call a function offer-tpl that creates the corresponding HTML. In order to do so, offer-tpl needs the offer itself as well as the related company:
(for [offer (:offers #query)]
(offer-tpl offer (get-in #query [:companies (keyword (:companyId offer))]))))))
Despite the fact that this surely can be done more elegant (suggestions very appreciated), the get-in doesn't work. (:companyId offer) returns a number (e.g. 22) but (keyword (:companyId offer)) returns nil. Calling (keyword (str (:companyId offer))) does the trick, but aren't there any other ways to do this?
(keyword "22") or (keyword (str 22)) returns :22
The reason you are getting :22 is likely because of the keywordize-keys option of a JSON translation. For example:
cljs-http defaults to keywordize-keys for jsonp:
https://github.com/r0man/cljs-http/blob/1fb899d3f9c5728521786432b5f6c36d1d7a1452/src/cljs_http/core.cljs#L115
But you can (and should) in this case pass in a flag to disable keywordization.
Not all keys in JSON are appropriate for Clojure keywordization. For example spaces in a JSON key are valid, but not in Clojure.
Please be aware that numeric keywords are probably incorrect.
https://clojuredocs.org/clojure.core/keyword#example-542692cec026201cdc326d70
It seems like that caveat has been removed from the current Clojure website, so perhaps that means something but I'm not sure what.
http://clojure.org/reference/reader Currently states that
Keywords - Keywords are like symbols, except: They can and must begin
with a colon, e.g. :fred. They cannot contain '.' or name classes.
Like symbols, they can contain a namespace, :person/name A keyword
that begins with two colons is resolved in the current namespace: In
the user namespace, ::rect is read as :user/rect
and that
Symbols begin with a non-numeric character and can contain
alphanumeric.
This definition of a keyword excludes :22 and :with spaces
The keyword function returns a result for invalid input, but this is not an endorsement, it is simply because checking for incorrect input would be a performance overhead in a core part of Clojure.
In short, not all JSON keys translate to keywords, so you should avoid keywordize-keys unless you know the keyspace and/or doing so provides some conveniences.
I'm attempting to use the clojure pantomime library to extract/ocr text from a large number of tif documents (among others).
My plan has been to use pmap for to apply the mapping over a sequence of input data (from a postgres database) and then update that same postgres database with the tika/tesseract OCR output. This has been working ok, however i notice in htop that many of the cores are idle at times.
Is there anyway to reconcile this, and what steps can i take to determine why this may be blocking somewhere? All processing occurs on a single tif file, and each thread is entirely mutually exclusive.
Additional info:
some tika/tesseract processes take 3 seconds, others take up to 90 seconds. Generally speaking, tika is heavily CPU bound. I have ample memory available according to htop.
postgres has no locking issues in session management, so i don't think thats holding me up.
maybe somewhere future's are waiting to deref? how to tell where?
Any tips appreciated, thanks. Code added below.
(defn parse-a-path [{:keys [row_id, file_path]}]
(try
(let [
start (System/currentTimeMillis)
mime_type (pm/mime-type-of file_path)
file_content (-> file_path (extract/parse) :text)
language (pl/detect-language file_content)
]
{:mime_type mime_type
:file_content file_content
:language language
:row_id row_id
:parse_time_in_seconds (float (/ ( - (System/currentTimeMillis) start) 100))
:record_status "doc parsed"})))
(defn fetch-all-batch []
(t/info (str "Fetching lazy seq. all rows for batch.") )
(jdbc/query (db-connection)
["select
row_id,
file_path ,
file_extension
from the_table" ]))
(defn update-a-row [{:keys [row_id, file_path, file_extension] :as all-keys}]
(let [parse-out (parse-a-path all-keys )]
(try
(doall
(jdbc/execute!
(db-connection)
["update the_table
set
record_last_updated = current_timestamp ,
file_content = ? ,
mime_type = ? ,
language = ? ,
parse_time_in_seconds = ? ,
record_status = ?
where row_id = ? "
(:file_content parse-out) ,
(:mime_type parse-out) ,
(:language parse-out) ,
(:parse_time_in_seconds parse-out) ,
(:record_status parse-out) ,
row_id ])
(t/debug (str "updated row_id " (:row_id parse-out) " (" file_extension ") "
" in " (:parse_time_in_seconds parse-out) " seconds." )))
(catch Exception _ ))))
(dorun
(pmap
#(try
(update-a-row %)
(catch Exception e (t/error (.getNextException e)))
)
fetch-all-batch )
)
pmap runs the map function in parallel on batches of (+ 2 cores), but preserves ordering. This means if you have 8 cores, a batch of 10 items will be processed, but the new batch will only be started if all 10 have finished.
You could create your own code that uses combinations of future, delay and deref, which would be good academic exercise. After that, you can throw out your code and start using the claypoole library, which has a set of abstractions that cover the majority of uses of future.
For this specific case, use their unordered pmap or pfor implementations (upmap and upfor), which do exactly the same thing pmap does but do not have ordering; new items are picked up as soon as any one item in the batch is finished.
In situations where IO is the main bottleneck, or where processing times can greatly vary between items of work, it is the best way to parallelize map or for operations.
Of course you should take care not to rely on any sort of ordering for the return values.
(require '[com.climate.claypoole :as cp])
(cp/upmap (cp/ncpus)
#(try
(update-a-row %)
(catch Exception e (t/error (.getNextException e)))
)
fetch-all-batch )
I had a similar problem some time ago. I guess that you are making the same assumptions as me:
pmap calls f in parallel. But that doesn't mean that the work is shared equally. As you said, some take 3 seconds whereas other take 90 seconds. The thread that finished in 3 seconds does NOT ask the other ones to share some of the work left to do. So the finished threads just wait iddle until the last one finishes.
you didn't describe exactly how is your data but I will assume that you are using some kind of lazy sequence, which is bad for parallel processing. If your process is CPU bounded and you can hold your entire input in memory then prefer the use of clojure.core.reducers ('map', 'filter' and specially 'fold') to the use of the lazy map, filter and others.
In my case, these tips drop the processing time from 34 to a mere 8 seconds. Hope it helps