Cypher QL doesn't execute atomically query - clojure

I'm starting to work with Neo4j and I noticed a really bad behaviour when updating a property on a node I am reading at the same moment. The clojure code I wrote use Neocons library to communicate with Neo4j:
(ns ams.utils.t.test-cypher-error
(:require [clojurewerkz.neocons.rest :as rest]
[clojurewerkz.neocons.rest.nodes :as nodes]
[clojurewerkz.neocons.rest.cypher :as cypher]))
(rest/connect! "http://192.168.0.101:7474/db/data")
(def counter-id (:id (nodes/create {:counter 0})))
(defn update-counter [] (cypher/query "START c = node({id}) SET c.counter = c.counter + 1 RETURN c.counter as counter" {"id" counter-id}))
(doall (apply pcalls (repeat 10 update-counter)))
(println "Counter:" ((comp :counter :data) (nodes/get counter-id)))
(nodes/destroy counter-id)
Guess the result:
Counter: 4
Sometimes is 5, sometimes 4, but you got the problem here: between the START and the SET clause the value of the counter is changed but cypher doesn't catch it!
Two questions here:
Am I doing something wrong?
Is there any viable unique counter rest generation algorithm in Neo4j?
Neo4j version is 1.9RC1, thanks in advance!

The problem you're encountering is that neo4j does not have implicit read-locking. So here's what sometimes happens:
Query 1 starts
Query 1 reads counter value as 3
Query 1 sets counter values to 3+1 = 4
Query 2 starts
Query 2 reads counter value as 4
Query 2 sets counter values to 4+1 = 5
And here's what sometimes happens:
Query 1 starts
Query 2 starts
Query 1 reads counter value as 3
Query 2 reads counter value as 3
Query 1 sets counter values to 3+1 = 4
Query 2 sets counter values to 3+1 = 4
In read-locking databases (like most SQL servers) situation #2 could not happen. Query 2 would start and then would block until query 1 either committed or rolled back.
There may be a way to explicitly set read locks, but not through the REST API. The transaction API looks promising, though I'm not entirely sure it can give you what you want, and again, it is not supported via REST.

Related

Calling a function with arguments in lists

i have a function that takes 3 arguments
(defn times-changed-answer [rid qid csv-file] ...some code)
that counts for a user with (record-id) rid how many times he changed his answer with (question-code) qid. The data is in csv-file.
It works and i have tested it for multiple users.
Now i want to call this function for all users and for all questions.
I have a list of rids and a list of qids.
(def rid-list '(1 2 4 5 10)
(def qid-list '(166 167 168 169 180 141)
How could i call this function on all users for all questions?
The lists are of different length and the third argument (the file) is always the same.
I'd use for list comprehension - it depends on what result you expect
here; here e.g. [rid qid result] is returned for all of them:
(for [rid rid-list
qid qid-list]
[rid qid (times-changed-answer rid quid csv-file)])
If you want to have this in a map, you could e.g. reduce over that.

clojure pmap - why aren't i using all the cores?

I'm attempting to use the clojure pantomime library to extract/ocr text from a large number of tif documents (among others).
My plan has been to use pmap for to apply the mapping over a sequence of input data (from a postgres database) and then update that same postgres database with the tika/tesseract OCR output. This has been working ok, however i notice in htop that many of the cores are idle at times.
Is there anyway to reconcile this, and what steps can i take to determine why this may be blocking somewhere? All processing occurs on a single tif file, and each thread is entirely mutually exclusive.
Additional info:
some tika/tesseract processes take 3 seconds, others take up to 90 seconds. Generally speaking, tika is heavily CPU bound. I have ample memory available according to htop.
postgres has no locking issues in session management, so i don't think thats holding me up.
maybe somewhere future's are waiting to deref? how to tell where?
Any tips appreciated, thanks. Code added below.
(defn parse-a-path [{:keys [row_id, file_path]}]
(try
(let [
start (System/currentTimeMillis)
mime_type (pm/mime-type-of file_path)
file_content (-> file_path (extract/parse) :text)
language (pl/detect-language file_content)
]
{:mime_type mime_type
:file_content file_content
:language language
:row_id row_id
:parse_time_in_seconds (float (/ ( - (System/currentTimeMillis) start) 100))
:record_status "doc parsed"})))
(defn fetch-all-batch []
(t/info (str "Fetching lazy seq. all rows for batch.") )
(jdbc/query (db-connection)
["select
row_id,
file_path ,
file_extension
from the_table" ]))
(defn update-a-row [{:keys [row_id, file_path, file_extension] :as all-keys}]
(let [parse-out (parse-a-path all-keys )]
(try
(doall
(jdbc/execute!
(db-connection)
["update the_table
set
record_last_updated = current_timestamp ,
file_content = ? ,
mime_type = ? ,
language = ? ,
parse_time_in_seconds = ? ,
record_status = ?
where row_id = ? "
(:file_content parse-out) ,
(:mime_type parse-out) ,
(:language parse-out) ,
(:parse_time_in_seconds parse-out) ,
(:record_status parse-out) ,
row_id ])
(t/debug (str "updated row_id " (:row_id parse-out) " (" file_extension ") "
" in " (:parse_time_in_seconds parse-out) " seconds." )))
(catch Exception _ ))))
(dorun
(pmap
#(try
(update-a-row %)
(catch Exception e (t/error (.getNextException e)))
)
fetch-all-batch )
)
pmap runs the map function in parallel on batches of (+ 2 cores), but preserves ordering. This means if you have 8 cores, a batch of 10 items will be processed, but the new batch will only be started if all 10 have finished.
You could create your own code that uses combinations of future, delay and deref, which would be good academic exercise. After that, you can throw out your code and start using the claypoole library, which has a set of abstractions that cover the majority of uses of future.
For this specific case, use their unordered pmap or pfor implementations (upmap and upfor), which do exactly the same thing pmap does but do not have ordering; new items are picked up as soon as any one item in the batch is finished.
In situations where IO is the main bottleneck, or where processing times can greatly vary between items of work, it is the best way to parallelize map or for operations.
Of course you should take care not to rely on any sort of ordering for the return values.
(require '[com.climate.claypoole :as cp])
(cp/upmap (cp/ncpus)
#(try
(update-a-row %)
(catch Exception e (t/error (.getNextException e)))
)
fetch-all-batch )
I had a similar problem some time ago. I guess that you are making the same assumptions as me:
pmap calls f in parallel. But that doesn't mean that the work is shared equally. As you said, some take 3 seconds whereas other take 90 seconds. The thread that finished in 3 seconds does NOT ask the other ones to share some of the work left to do. So the finished threads just wait iddle until the last one finishes.
you didn't describe exactly how is your data but I will assume that you are using some kind of lazy sequence, which is bad for parallel processing. If your process is CPU bounded and you can hold your entire input in memory then prefer the use of clojure.core.reducers ('map', 'filter' and specially 'fold') to the use of the lazy map, filter and others.
In my case, these tips drop the processing time from 34 to a mere 8 seconds. Hope it helps

Google App Engine Query matches case [duplicate]

Using the google appengine datastore, is there a way to perform a gql query that specifies a WHERE clause on a StringProperty datatype that is case insensitive? I am not always sure what case the value will be in. The docs specify that the where is case sensitive for my values, is there a way to make this insensitive?
for instance the db Model would be this:
from google.appengine.ext import db
class Product(db.Model):
id = db.IntegerProperty()
category = db.StringProperty()
and the data looks like this:
id category
===================
1 cat1
2 cat2
3 Cat1
4 CAT1
5 CAT3
6 Cat4
7 CaT1
8 CAT5
i would like to say
gqlstring = "WHERE category = '{0}'".format('cat1')
returnvalue = Product.gql(gqlstring)
and have returnvalue contain
id category
===================
1 cat1
3 Cat1
4 CAT1
7 CaT1
I don't think there is an operator like that in the datastore.
Do you control the input of the category data? If so, you should choose a canonical form to store it in (all lowercase or all uppercase). If you need to store the original case for some reason, then you could just store two columns - one with the original, one with the standardized one. That way you can do a normal WHERE clause.
The datastore doesn't support case insensitive comparisons, because you can't index queries that use them (barring an index that transforms values). The solution is to store a normalized version of your string in addition to the standard one, as Peter suggests. The property classes in the AETycoon library may prove helpful, in particular, DerivedProperty.
This thread was helpful and makes me want to contribute with similar approach to make partial search match possible. I add one more field on datastore kind and save each word on normalized phrase as a set and then use IN filter to collide. This is an example with a Clojure. Normalize part should easy translate to java at least (thanks to #raek on #clojure), while database interaction should be convertable to any language:
(use '[clojure.contrib.string :only [split lower-case]])
(use '[appengine-magic.services.datastore :as ds])
; initialize datastore kind entity
(ds/defentity AnswerTextfield [value, nvalue, avalue])
; normalize and lowercase a string
(defn normalize [string-to-normalize]
(lower-case
(apply str
(remove #(= (Character/getType %) Character/NON_SPACING_MARK)
(java.text.Normalizer/normalize string-to-normalize java.text.Normalizer$Form/NFKD)))))
; save original value, normalized value and splitted normalized value
(defn textfield-save! [value]
(ds/save!
(let [nvalue (normalize value)]
(ds/new* AnswerTextfield [value nvalue (split #" " nvalue)]))))
; normalized search
(defn search-normalized [value]
(ds/query :kind AnswerTextfield
:filter [(= :nvalue (normalize value))]))
; partial normalized word search
(defn search-partial [value]
(flatten
(let [coll []]
(for [splitted-value (split #" " (normalize value))]
(merge coll
(ds/query :kind AnswerTextfield
:filter [(in :avalue [splitted-value])]))))))

Lisp, add new list to db in "for" loop, why returning NIL?

I wonder, how can I print in LISP each new value from loop "for" in new list, which each time creates by calling the function.
I have created the func:
(defun make (id name surname) (list :id id :name name :surname surname) )
Here I created the global variable:
(defvar *db* nil)
And here I defined the func for adding each new value to store it in db:
(defun add (cd) (push cd *db*))
So, I'm able to add each new data to db, like that:
(add (make 0 "Oleg" "Orlov" ) )
To look the content of my db , I can use:
*db*
So, I wonder how to put each new record-list to db using "for" loop, I print values in "for" loop in lisp like this:
(loop for i from 1 to 10 do ( ... ))
If , I use:
(loop for i from 0 to 10 do (add (make i "Oleg" "Orlov") ) )
If you read db using *db* you will see, that all evelen records were added, but after calling the last line, you will get NIL result in return.
Why do I catch NIL result, not T and what does it mean?
Thanks, best regards!
Every form in Lisp evaluates to something.
If a form you type in doesn't return a value, it will evaluate to NIL by default (otherwise, it evaluates to the value(s) it returns). Your loop doesn't actually return a value itself; it just performs 10 assignments (each of the intermediate expressions do return a value, but you don't collect and return them). Therefore, that code will return NIL.
If you haven't done so already, check out chapter 3 of Practical Common Lisp, in which Peter Seibel goes step-by-step through creating a simple database. It might give you some insights into the basics of how Lisp works. The specific question you ask (why forms return NIL by default, and what it means specifically in the context of Common Lisp) is answered in chapter 2 of the same book
As for how you would explicitly cause your loop to emit the list of items it added to *db*, try the following
(loop for i from 1 to 10
for elem = (make i "Oleg" "Orlov")
do (add elem)
collect elem)

Error when putting variable in table, only constants allowed?

Currently I am working on a Netlogo program where I need to use nodes and links for vehicle routing problem. (links are called streets in the program)
Here I have some practical problems of how to input variable linkspeed in a table with another node. Constants like 200 etc are fine. Online I found some examples where variables are used, but I do not know why I keep getting the following error:
Expected a constant.
(or why netlogo expects a constant)
Here is the relevant piece of code:
extensions [table]
streets-own [linkspeed linktoll]
nodes-own [netw]
;; In another piece of code linkspeed is assigned successfully to the links
to cheapcalc
;; start conditions set costs very high 300000
;; state 3 unsearched state 2 searching state 1 searched (for later purposes)
ask nodes [
set i 0 set j count nodes set netw table:make
while [i < j][
table:put netw (i) [3000000 3] set i (i + 1)]]
set i 0 let k 0
ask node 35 ;; here i use node 35 as an example.
;; node 35 is connected to node 34, 36, 20 and 50
[table:put netw (35) [0 1] ;; node need to search costs to travel to itself
;; putting constants is ok.
while [i < j]
[ask my-links
[ask both-ends
[if (who != 35) [set color blue
;; set temp ([linkspeed] of street 35 who) ;; here my real goal is to put this in stat of i. but i is easier than linkspeed.
table:put netw (who) [ i 2 ]
]
] ]
set i (i + 1)] ] ;; next node for later, no it is just repetition of the same.
end
I hope somebody knows what is going on...
The problem is most likely not putting a variable in a table, but putting a variable in a list (which you're then putting in a table).
Change the line below:
table:put netw (who) [ i 2 ]
to:
table:put netw (who) (list i 2)
This - (list i 2) - allows you to generate a list with variables in it, you can't do it the other way - [i 2].
Hope this helps.