redis.py: How to flush all the queries in a pipeline

redis.py: How to flush all the queries in a pipeline - python-2.7

I have a redis pipeline say:
r = redis.Redis(...).pipline()
Suppose I need to remove any residual query, if present in the pipeline without executing. Is there anything like r.clear()?
I have search docs and source code and I am unable to find anything.

The command list is simply a python list object. You can inspect it like such:
from redis import StrictRedis
r = StrictRedis()
pipe = r.pipeline()
pipe.set('KEY1', 1)
pipe.set('KEY2', 2)
pipe.set('KEY3', 3)
pipe.command_stack
[(('SET', 'KEY1', 1), {}), (('SET', 'KEY2', 2), {}), (('SET', 'KEY3', 3), {})]
This has not yet been sent to the server so you can just pop() or remove the commands you don't want. You can also just assign an empty list, pipe.command_stack = [].
If there is a lot you could simply just re-assign a new Pipeline object to pipe.
Hope this is what you meant.
Cheers
Joe

Use:
pipe.reset()
Other than the obvious advantage of ignoring implementation details (such as the command_stack mentioned before), this method will take care of interrupting the current ongoing transaction (if any) and returning the connection to the pool.

Related

To delete the vertex by looping the dataframe in glue timeout

I want to delete the vertex to loop on one dataframe.
Suppose I will delete the vertex based on some cols of dataframe
my function is written in this way: and it is timeout
def delete_vertices_for_label(rows):
conn = self.remote_connection()
g = self.traversal_source(conn)
for row in rows:
entries = row.asDict()
create_traversal = __.hasLabel(str(entries["~label"]))
for key, value in entries.iteritems():
if key=='~id':
pass
elif key == '~label':
pass
else:
create_traversal.has(key), value)
g.V().coalesce(create_traversal).drop().iterate()
I have succeed in using this function locally on tinkerGraph, however ,when I try to run above function in glue which manipulate data in aws Neptune ; it failed.
I also create one lambda function in below: still meet the issue like timeout.
def run_sample_gremlin_basedon_property():
remoteConn = DriverRemoteConnection('ws://' + CLUSTER_ENDPOINT + ":" +
CLUSTER_PORT + '/gremlin', 'g')
graph = Graph()
g = graph.traversal().withRemote(remoteConn)
create_traversal = __.hasLabel("Media")
create_traversal.has("Media_ID", "99999")
create_traversal.has("src_name", "NET")
print ("create_traversal:",create_traversal)
g.V().coalesce(create_traversal).drop().iterate()

Dropping a vertex involves dropping associated properties and edges as well, and hence depending on the data, it could take a large amount of time. Drop step was optimized in one of the engine releases [1], so ensure that you are on a version newer than that. If you still get timeouts, set an appropriate timeout value on the cluster using the cluster parameter for timeouts.
Note: This answer is based off EmmaYang's communication with AWS Support. Looks like the Gluejob was configured in a manner that needs a high timeout. I'm not familiar enough with Glue to comment more on that (Emma - Can you please elaborate that?)
[1] https://docs.aws.amazon.com/neptune/latest/userguide/engine-releases-1.0.1.0.200296.0.html

Running multiple functions

I think I have confused myself on how I should approach this.
I have a number of functions that I use to interact with an api, for example get product ID, update product detail, update inventory. These calls need to be done one after another, and are all wrapped up in one function api.push().
Let's say I need to run api.push() 100 times, 100 product IDs
What I want to do is run many api.push at the same time, so that I can speed up the processing of my. For example, lets say I want to run 5 at a time.
I am confused to whether this is multiprocessing or threading, or neither. I tried both but they didn't seem to work, for example I have this
jobs = []
for n in range(0, 4):
print "adding a job %s" % n
p = multiprocessing.Process(target=api.push())
jobs.append(p)
# Starts threads
for job in jobs:
job.start()
for job in jobs:
job.join()
Any guidance would be appreciated
Thanks

Please read the python doc and do some research on the global interpreter lock to see whether you should use threading or multiprocessing in your situation.
I do not know the inner workings of api.push, but please note that you should pass a function reference to multiprocessing.Process.
Using p = multiprocessing.Process(target=api.push()) will pass whatever api.push() returns as the function to be called in the subprocesses.
if api.push is the function to be called in the subprocess, you should use p = multiprocessing.Process(target=api.push) instead, as it passes a reference to the function rather than a reference to the result of the function.

Persisting State from a DRPC Spout in Trident

I'm experimenting with Storm and Trident for this project, and I'm using Clojure and Marceline to do so. I'm trying to expand the wordcount example given on the Marceline page, such that the sentence spout comes from a DRPC call rather than from a local spout. I'm having problems which I think stem from the fact that the DRPC stream needs to have a result to return to the client, but I would like the DRPC call to effectively return null, and simply update the persisted data.
(defn build-topology
([]
(let [trident-topology (TridentTopology.)]
(let [
;; ### Two alternatives here ###
;collect-stream (t/new-stream trident-topology "words" (mk-fixed-batch-spout 3))
collect-stream (t/drpc-stream trident-topology "words")
]
(-> collect-stream
(t/group-by ["args"])
(t/persistent-aggregate (MemoryMapState$Factory.)
["args"]
count-words
["count"]))
(.build trident-topology)))))
There are two alternatives in the code - the one using a fixed batch spout loads with no problem, but when I try to load the code using a DRPC stream instead, I get this error:
InvalidTopologyException(msg:Component: [b-2] subscribes from non-existent component [$mastercoord-bg0])
I believe this error comes from the fact that the DRPC stream must be trying to subscribe to an output in order to have something to return to the client - but persistent-aggregate doesn't offer any such outputs to subscribe to.
So how can I set up my topology so that a DRPC stream leads to my persisted data being updated?
Minor update: Looks like this might not be possible :( https://issues.apache.org/jira/browse/STORM-38

Composing Flow Graphs

I've been playing around with Akka Streams and get the idea of creating Flows and wiring them together using FlowGraphs.
I know this part of Akka is still under development so some things may not be finished and some other bits may change, but is it possible to create a FlowGraph that isn't "complete" - i.e. isn't attached to a Sink - and pass it around to different parts of my code to be extended by adding Flow's to it and finally completed by adding a Sink?
Basically, I'd like to be able to compose FlowGraphs but don't understand how... Especially if a FlowGraph has split a stream by using a Broadcast.
Thanks

The next week (December) will be documentation writing for us, so I hope this will help you to get into akka streams more easily! Having that said, here's a quick answer:
Basically you need a PartialFlowGraph instead of FlowGraph. In those we allow the usage of UndefinedSink and UndefinedSource which you can then"attach" afterwards. In your case, we also provide a simple helper builder to create graphs which have exactly one "missing" sink – those can be treated exactly as if it was a Source, see below:
// for akka-streams 1.0-M1
val source = Source() { implicit b ⇒
// prepare an undefined sink, which can be relpaced by a proper sink afterwards
val sink = UndefinedSink[Int]
// build your processing graph
Source(1 to 10) ~> sink
// return the undefined sink which you mean to "fill in" afterwards
sink
}
// use the partial graph (source) multiple times, each time with a different sink
source.runWith(Sink.ignore)
source.runWith(Sink.foreach(x ⇒ println(x)))
Hope this helps!

Transactions with Python sqlite3

I'm trying to port some code to Python that uses sqlite databases, and I'm trying to get transactions to work, and I'm getting really confused. I'm really confused by this; I've used sqlite a lot in other languages, because it's great, but I simply cannot work out what's wrong here.
Here is the schema for my test database (to be fed into the sqlite3 command line tool).
BEGIN TRANSACTION;
CREATE TABLE test (i integer);
INSERT INTO "test" VALUES(99);
COMMIT;
Here is a test program.
import sqlite3
sql = sqlite3.connect("test.db")
with sql:
c = sql.cursor()
c.executescript("""
update test set i = 1;
fnord;
update test set i = 0;
""")
You may notice the deliberate mistake in it. This causes the SQL script to fail on the second line, after the update has been executed.
According to the docs, the with sql statement is supposed to set up an implicit transaction around the contents, which is only committed if the block succeeds. However, when I run it, I get the expected SQL error... but the value of i is set from 99 to 1. I'm expecting it to remain at 99, because that first update should be rolled back.
Here is another test program, which explicitly calls commit() and rollback().
import sqlite3
sql = sqlite3.connect("test.db")
try:
c = sql.cursor()
c.executescript("""
update test set i = 1;
fnord;
update test set i = 0;
""")
sql.commit()
except sql.Error:
print("failed!")
sql.rollback()
This behaves in precisely the same way --- i gets changed from 99 to 1.
Now I'm calling BEGIN and COMMIT explicitly:
import sqlite3
sql = sqlite3.connect("test.db")
try:
c = sql.cursor()
c.execute("begin")
c.executescript("""
update test set i = 1;
fnord;
update test set i = 0;
""")
c.execute("commit")
except sql.Error:
print("failed!")
c.execute("rollback")
This fails too, but in a different way. I get this:
sqlite3.OperationalError: cannot rollback - no transaction is active
However, if I replace the calls to c.execute() to c.executescript(), then it works (i remains at 99)!
(I should also add that if I put the begin and commit inside the inner call to executescript then it behaves correctly in all cases, but unfortunately I can't use that approach in my application. In addition, changing sql.isolation_level appears to make no difference to the behaviour.)
Can someone explain to me what's happening here? I need to understand this; if I can't trust the transactions in the database, I can't make my application work...
Python 2.7, python-sqlite3 2.6.0, sqlite3 3.7.13, Debian.

For anyone who'd like to work with the sqlite3 lib regardless of its shortcomings, I found that you can keep some control of transactions if you do these two things:
set Connection.isolation_level = None (as per the docs, this means autocommit mode)
avoid using executescript at all, because according to the docs it "issues a COMMIT statement first" - ie, trouble. Indeed I found it interferes with any manually set transactions
So then, the following adaptation of your test works for me:
import sqlite3
sql = sqlite3.connect("/tmp/test.db")
sql.isolation_level = None
c = sql.cursor()
c.execute("begin")
try:
c.execute("update test set i = 1")
c.execute("fnord")
c.execute("update test set i = 0")
c.execute("commit")
except sql.Error:
print("failed!")
c.execute("rollback")

Per the docs,
Connection objects can be used as context managers that automatically
commit or rollback transactions. In the event of an exception, the
transaction is rolled back; otherwise, the transaction is committed:
Therefore, if you let Python exit the with-statement when an exception occurs, the transaction will be rolled back.
import sqlite3
filename = '/tmp/test.db'
with sqlite3.connect(filename) as conn:
cursor = conn.cursor()
sqls = [
'DROP TABLE IF EXISTS test',
'CREATE TABLE test (i integer)',
'INSERT INTO "test" VALUES(99)',]
for sql in sqls:
cursor.execute(sql)
try:
with sqlite3.connect(filename) as conn:
cursor = conn.cursor()
sqls = [
'update test set i = 1',
'fnord', # <-- trigger error
'update test set i = 0',]
for sql in sqls:
cursor.execute(sql)
except sqlite3.OperationalError as err:
print(err)
# near "fnord": syntax error
with sqlite3.connect(filename) as conn:
cursor = conn.cursor()
cursor.execute('SELECT * FROM test')
for row in cursor:
print(row)
# (99,)
yields
(99,)
as expected.

Python's DB API tries to be smart, and begins and commits transactions automatically.
I would recommend to use a DB driver that does not use the Python DB API, like apsw.

Here's what I think is happening based on my reading of Python's sqlite3 bindings as well as official Sqlite3 docs. The short answer is that if you want a proper transaction, you should stick to this idiom:
with connection:
db.execute("BEGIN")
# do other things, but do NOT use 'executescript'
Contrary to my intuition, with connection does not call BEGIN upon entering the scope. In fact it doesn't do anything at all in __enter__. It only has an effect when you __exit__ the scope, choosing either COMMIT or ROLLBACK depending on whether the scope is exiting normally or with an exception.
Therefore, the right thing to do is to always explicitly mark the beginning of your transactional with connection blocks using BEGIN. This renders isolation_level irrelevant within the block, because thankfully it only has an effect while autocommit mode is enabled, and autocommit mode is always suppressed within transaction blocks.
Another quirk is executescript, which always issues a COMMIT before running your script. This can easily mess up the transactional with connection block, so your choice is to either
use exactly one executescript within the with block and nothing else, or
avoid executescript entirely; you can call execute as many times as you want, subject to the one-statement-per-execute limitation.

Normal .execute()'s work as expected with the comfortable default auto-commit mode and the with conn: ... context manager doing auto-commit OR rollback - except for protected read-modify-write transactions, which are explained at the end of this answer.
sqlite3 module's non-standard conn_or_cursor.executescript() doesn't take part in the (default) auto-commit mode (and so doesn't work normally with the with conn: ... context manager) but forwards the script rather raw. Therefor it just commits a potentially pending auto-commit transactions at start, before "going raw".
This also means that without a "BEGIN" inside the script executescript() works without a transaction, and thus no rollback option upon error or otherwise.
So with executescript() we better use a explicit BEGIN (just as your inital schema creation script did for the "raw" mode sqlite command line tool). And this interaction shows step by step whats going on:
>>> list(conn.execute('SELECT * FROM test'))
[(99,)]
>>> conn.executescript("BEGIN; UPDATE TEST SET i = 1; FNORD; COMMIT""")
Traceback (most recent call last):
File "<interactive input>", line 1, in <module>
OperationalError: near "FNORD": syntax error
>>> list(conn.execute('SELECT * FROM test'))
[(1,)]
>>> conn.rollback()
>>> list(conn.execute('SELECT * FROM test'))
[(99,)]
>>>
The script didn't reach the "COMMIT". And thus we could the view the current intermediate state and decide for rollback (or commit nevertheless)
Thus a working try-except-rollback via excecutescript() looks like this:
>>> list(conn.execute('SELECT * FROM test'))
[(99,)]
>>> try: conn.executescript("BEGIN; UPDATE TEST SET i = 1; FNORD; COMMIT""")
... except Exception as ev:
... print("Error in executescript (%s). Rolling back" % ev)
... conn.executescript('ROLLBACK')
...
Error in executescript (near "FNORD": syntax error). Rolling back
<sqlite3.Cursor object at 0x011F56E0>
>>> list(conn.execute('SELECT * FROM test'))
[(99,)]
>>>
(Note the rollback via script here, because no .execute() took over commit control)
And here a note on the auto-commit mode in combination with the more difficult issue of a protected read-modify-write transaction - which made #Jeremie say "Out of all the many, many things written about transactions in sqlite/python, this is the only thing that let me do what I want (have an exclusive read lock on the database)." in a comment on an example which included a c.execute("begin"). Though sqlite3 normally does not make a long blocking exclusive read lock except for the duration of the actual write-back, but more clever 5-stage locks to achieve enough protection against overlapping changes.
The with conn: auto-commit context does not already put or trigger a lock strong enough for protected read-modify-write in the 5-stage locking scheme of sqlite3. Such lock is made implicitely only when the first data-modifying command is issued - thus too late.
Only an explicit BEGIN (DEFERRED) (TRANSACTION) triggers the wanted behavior:
The first read operation against a database creates a SHARED lock and
the first write operation creates a RESERVED lock.
So a protected read-modify-write transaction which uses the programming language in general way (and not a special atomic SQL UPDATE clause) looks like this:
with conn:
conn.execute('BEGIN TRANSACTION') # crucial !
v = conn.execute('SELECT * FROM test').fetchone()[0]
v = v + 1
time.sleep(3) # no read lock in effect, but only one concurrent modify succeeds
conn.execute('UPDATE test SET i=?', (v,))
Upon failure such read-modify-write transaction could be retried a couple of times.

You can use the connection as a context manager. It will then automatically rollback the transactions in the event of an exception or commit them otherwise.
try:
with con:
con.execute("insert into person(firstname) values (?)", ("Joe",))
except sqlite3.IntegrityError:
print("couldn't add Joe twice")
See https://docs.python.org/3/library/sqlite3.html#using-the-connection-as-a-context-manager

This is a bit old thread but if it helps I've found that doing a rollback on the connection object does the trick.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

redis.py: How to flush all the queries in a pipeline - python-2.7

I have a redis pipeline say: r = redis.Redis(...).pipline() Suppose I need to remove any residual query, if present in the pipeline without executing. Is there anything like r.clear()? I have search docs and source code and I am unable to find anything.

Use: pipe.reset() Other than the obvious advantage of ignoring implementation details (such as the command_stack mentioned before), this method will take care of interrupting the current ongoing transaction (if any) and returning the connection to the pool.

Related

To delete the vertex by looping the dataframe in glue timeout

Running multiple functions

Persisting State from a DRPC Spout in Trident

Composing Flow Graphs

Transactions with Python sqlite3

Categories

Resources