Factory instance not creating a new deferred - python-2.7

I am pretty new to Twisted, so I am sure this is a rookie mistake. I have built a simple server which receives a message from the client and upon receipt of message the server fires a callback which prints the message to the console.
At first instance, the server works as expected. Unfortunately, when I start up a second client I get the follow error "twisted.internet.defer.AlreadyCalledError." It was my understanding that the factory would make a new instance of the deferred i.e. the new deferred wouldn't have been called before?
Please see the code below. Any help would be very appreciated.
import sys
from twisted.internet.protocol import ServerFactory, Protocol
from twisted.internet import defer
class LockProtocol(Protocol):
lockData = ''
def dataReceived(self, data):
self.lockData += data
if self.lockData.endswith('??'):
self.lockDataReceived(self.lockData)
def lockDataReceived(self, lockData):
self.factory.lockDataFinished(lockData)
class LockServerFactory(ServerFactory):
protocol = LockProtocol
def __init__(self):
self.deferred = defer.Deferred() # Initialise deferred
def lockDataFinished(self, lockData):
self.deferred.callback(lockData)
def clientConnectionFailed(self, connector, reason):
self.deferred.errback(reason)
def main():
HOST = '127.0.0.1' # localhost
PORT = 10001
def got_lockData(lockData):
print "We have received lockData. It is as follows:", lockData
def lockData_failed(err):
print >> sys.stderr, 'The lockData download failed.'
errors.append(err)
factory = LockServerFactory()
from twisted.internet import reactor
# Listen for TCP connections on a port, and use our factory to make a protocol instance for each new connection
port = reactor.listenTCP(PORT,factory)
print 'Serving on %s' % port.getHost()
# Set up callbacks
factory.deferred.addCallbacks(got_lockData,lockData_failed)
reactor.run() # Start the reactor
if __name__ == '__main__':
main()

Notice that there is only one LockServerFactory ever created in your program:
factory = LockServerFactory()
However, as many LockProtocol instances are created as connections are accepted. If you have per-connection state, the place to put it is on LockProtocol.
It looks like your "lock data completed" event is not a one-off so a Deferred is probably not the right abstraction for this job.
Instead of a LockServerFactory with a Deferred that fires when that event happens, perhaps you want a multi-use event handler, perhaps custom built:
class LockServerFactory(ServerFactory):
protocol = LockProtocol
def __init__(self, lockDataFinished):
self.lockDataFinished = lockDataFinished
factory = LockServerFactory(got_lockData)
(Incidentally, notice that I've dropped clientConnectionFailed from this implementation: that's a method of ClientFactory. It will never be called on a server factory.)

Related

Psycopg2 & Flask - tying connection to before_request & teardown_appcontext

Cheers guys,
refactoring my Flask app I got stuck at tying the db connection to #app.before_request and closing it at #app.teardown_appcontext. I am using plain Psycopg2 and the app factory pattern.
First I created a function to call wihtin the app factory so I could use #app as suggested by Miguel Grinberg here:
def create_app(test_config=None):
app = Flask(__name__, instance_relative_config=True)
--
from shop.db import connect_and_close_db
connect_and_close_db(app)
--
return app
Then I tried this pattern suggested on http://flask.pocoo.org/docs/1.0/appcontext/#storing-data:
def connect_and_close_db(app):
#app.before_request
def get_db_test():
conn_string = "dbname=testdb user=testuser password=test host=localhost"
if 'db' not in g:
g.db = psycopg2.connect(conn_string)
return g.db
#app.teardown_appcontext
def close_connection(exception):
db = g.pop('db', None)
if db is not None:
db.close()
It resulted in:
TypeError: 'psycopg2.extensions.connection' object is not callable
Anyone has an idea what happend and how to make it work?
Furthermore I wonder how I would access the connection object for creating a cursor once its creation is tied to before_request?
This solution is probably far from perfect, and it's not really DRY. I'd welcome comments, or other answers that build on this.
To implement for raw psycopg2 support, you probably need to take a look at the connection pooler. There's also a good guide on how to implement this outwith Flask.
The basic idea is to create your connection pool first. You want this to be established when the flask application initializes (This could within the python interpreter or via gunicorn worker of which there may be several - in which case each worker has its own connection pool). I chose to store the returned pool in the config:
from flask import Flask, g, jsonify
import psycopg2
from psycopg2 import pool
app = Flask(__name__)
app.config['postgreSQL_pool'] = psycopg2.pool.SimpleConnectionPool(1, 20,
user = "postgres",
password = "very_secret",
host = "127.0.0.1",
port = "5432",
database = "postgres")
Note the first two arguments to SimpleConnectionPool are the min & max connections. That's the number of connections going to your database server, bwtween 1 & 20 in this case.
Next define a get_db function:
def get_db():
if 'db' not in g:
g.db = app.config['postgreSQL_pool'].getconn()
return g.db
The SimpleConnectionPool.getconn() method used here simply returns a connection from the pool, which we assign to g.db and return. This means when we call get_db() anywhere in the code it returns the same connection, or creates a connection if not present. There's no need for a before.context decorator.
Do define your teardown function:
#app.teardown_appcontext
def close_conn(e):
db = g.pop('db', None)
if db is not None:
app.config['postgreSQL_pool'].putconn(db)
This runs when the application context is destroyed, and uses SimpleConnectionPool.putconn() to put away the connection.
Finally define a route:
#app.route('/')
def index():
db = get_db()
cursor = db.cursor()
cursor.execute("select 1;")
result = cursor.fetchall()
print (result)
cursor.close()
return jsonify(result)
This code works for me tested against postgres runnning in a docker container. A few things which probably should be improved:
This view isn't very DRY. Perhaps you could move some of this into the get_db function so it returns a cursor. (!!!)
When the python interpreter exits, you should also find away to close the connection with app.config['postgreSQL_pool'].closeall
Although tested some kind of way to monitor the pool would be good, so that you could watch pool/db connections under load and make sure the pooler behaves as expected.
(!!!)In another land, the sqlalchemy.scoped_session documentation explains more things relating to this, with some theory on how its 'sessions' work in relation to requests. They have implemented it in such a way that you can call Session.query('SELECT 1') and it will create the session if it doesn't already exist.
EDIT: Here's a gist with your app factory pattern, and sample usage in the comment.
Currently I am using this pattern:
(I ll edit this answer eventually if I came up with better solution)
This is main script in which we use database. It uses two functions from config: get_db() to get connection from pool and put_db() to return connection into pool:
from config import get_db, put_db
from threading import Thread
from time import sleep
def select():
db = get_db()
sleep(1)
cursor = db.cursor()
# Print select result and db connection address in memory
# To see if it gets connection from another addreess on second thread
cursor.execute("SELECT 'It works %s'", (id(db),))
print(cursor.fetchone())
cursor.close()
put_db(db)
Thread(target=select).start()
Thread(target=select).start()
print('Main thread')
This is config.py:
import sys
import os
import psycopg2
from psycopg2 import pool
from dotenv import load_dotenv, find_dotenv
load_dotenv(find_dotenv())
def get_db(key=None):
return getattr(get_db, 'pool').getconn(key)
def put_db(conn, key=None):
getattr(get_db, 'pool').putconn(conn, key=key)
# So we here need to init connection pool in main thread in order everything to work
# Pool is initialized under function object get_db
try:
setattr(get_db, 'pool', psycopg2.pool.ThreadedConnectionPool(1, 20, os.getenv("DB")))
print(color.red('Initialized db'))
except psycopg2.OperationalError as e:
print(e)
sys.exit(0)
And also if you are curious there is an .env file containing db connection string in DB env variable:
DB="dbname=postgres user=postgres password=1234 host=127.0.0.1 port=5433"
(.env file is loaded using dotenv module in config.py)

Reference function of Twisted Connection Bot Class

I am currently working on developing a Twitch.tv chat and moderation bot(full code can be found on github here: https://github.com/DarkElement75/tecsbot; might not be fully updated to match the problem I describe), and in doing this I need to have many different Twisted TCP connections to various channels. I have (because of the way Twitch's Whisper system works) one connection for sending / receiving whispers, and need the ability for any connection to any channel can reference this whisper connection and send data on this connection, via the TwitchWhisperBot's write() function. However, I have yet to find a method that allows my current global function to reference this write() function. Here is what I have right now:
#global functions
def send_whisper(self, user, msg):
whisper_str = "/w %s %s" % (user, msg)
print dir(whisper_bot)
print dir(whisper_bot.transport)
whisper_bot.write("asdf")
def whisper(self, user, msg):
'''global whisper_user, whisper_msg
if "/mods" in msg:
thread.start_new_thread(get_whisper_mods_msg, (self, user, msg))
else:
whisper_user = user
whisper_msg = msg'''
if "/mods" in msg:
thread.start_new_thread(get_whisper_mods_msg, (self, user, msg))
else:
send_whisper(self, user, msg)
#Example usage of these (inside a channel connection):
send_str = "Usage: !permit add <user> message/time/<time> <message count/time duration/time unit>/permanent"
whisper(self, user, send_str)
#Whisper classes
class TwitchWhisperBot(irc.IRCClient, object):
def write(self, msg):
self.msg(self.channel, msg.encode("utf-8"))
logging.info("{}: {}".format(self.nickname, msg))
class WhisperBotFactory(protocol.ClientFactory, object):
wait_time = 1
def __init__(self, channel):
global whisper_bot
self.channel = channel
whisper_bot = TwitchWhisperBot(self.channel)
def buildProtocol(self, addr):
return TwitchWhisperBot(self.channel)
def clientConnectionLost(self, connector, reason):
# Reconnect when disconnected
logging.error("Lost connection, reconnecting")
self.protocol = TwitchWhisperBot
connector.connect()
def clientConnectionFailed(self, connector, reason):
# Keep retrying when connection fails
msg = "Could not connect, retrying in {}s"
logging.warning(msg.format(self.wait_time))
time.sleep(self.wait_time)
self.wait_time = min(512, self.wait_time * 2)
connector.connect()
#Execution begins here:
#main whisper bot where other threads with processes will be started
#sets variables for connection to twitch chat
whisper_channel = '#_tecsbot_1444071429976'
whisper_channel_parsed = whisper_channel.replace("#", "")
server_json = get_json_servers()
server_arr = (server_json["servers"][0]).split(":")
server = server_arr[0]
port = int(server_arr[1])
#try:
# we are using this to make more connections, better than threading
# Make logging format prettier
logging.basicConfig(format="[%(asctime)s] %(message)s",
datefmt="%H:%M:%S",
level=logging.INFO)
# Connect to Twitch IRC server, make more instances for more connections
#Whisper connection
whisper_bot = ''
reactor.connectTCP(server, port, WhisperBotFactory(whisper_channel))
#Channel connections
reactor.connectTCP('irc.twitch.tv', 6667, BotFactory("#darkelement75"))
The simplest solution here would be using the Singleton pattern, since you're guaranteed to only have a single connection of each type at any given time. Personally, with Twisted, I think the simplest solution is to use the reactor to store your instance (since reactor itself is a singleton).
So what you want to do is, inside TwitchWhisperBot, at sign in:
def signedOn(self):
reactor.whisper_instance = self
And then, anywhere else in the code, you can access that instance:
reactor.whisper_instance = self
Just for sanity's sake, you should also check if it's been set or not:
if getattr(reactor, 'whisper_instance'):
reactor.whisper_instance.write("test_user", "message")
else:
logging.warning("whisper instance not set")

twisted: how to delete a static resource?

I have a basic TCP server implemented in twisted, to which a client is connected. The client connects and sends data necessary to start a websocket resource. Using these details sent by the TCP client, I want to add an autobahn websocket resource as a child under a twisted web resource. and when the client disconnects, I want to remove this child from the twisted web resource. please suggest, what should be the best way to implement this? Can I use resource.delEntity(child)?
so far, the code looks like this:
from twisted.internet.protocol import Protocol, Factory
from autobahn.twisted.websocket import WebSocketServerFactory, WebSocketServerProtocol
from autobahn.twisted.resource import WebSocketResource
class ProtoPS(WebSocketServerProtocol):
def onMessage(self,payload,isBinary):
if not isBinary:
print("Text message received: {}".format(payload.decode('utf8')))
self.sendMessage(payload,isBinary)
class BaseResource:
def __init__(self,proto,vid):
self.vid = vid
self.factory = WebSocketServerFactory()
self.factory.protocol = proto
self.resource = WebSocketResource(self.factory)
wsroot.putChild(self.vid,self.resource)
class BackendProto(Protocol):
def __init__(self):
self.SERVICEMAP = {}
def dataReceived(self,data):
msg = json.loads(data)
if ('cmd' in msg) and (msg['cmd'] == "create"):
self.vid = msg['client']['id']
self.SERVICEMAP[self.vid] = BaseResource(ProtoPS,self.vid)
def connectionLost(self,reason):
wsroot.delEntity(self.vid)
del self.SERVICEMAP[self.vid]
class BackendFactory(Factory):
protocol = BackendProto
if __name__ == '__main__':
reactor.listenTCP(8081,BackendFactory())
wsroot = Data("","text/plain")
wssite = Site(wsroot)
reactor.listenTCP(9000,wssite)

Python - passing modified callback to dispatcher

Scrapy application, but the question is really about the Python language - experts can probably answer this immediately without knowing the framework at all.
I've got a class called CrawlWorker that knows how to talk to so-called "spiders" - schedule their crawls, and manage their lifecycle.
There's a TwistedRabbitClient that has-one CrawlWorker. The client only knows how to talk to the queue and hand off messages to the worker - it gets completed work back from the worker asynchronously by using the worker method connect_to_scrape below to connect to a signal emitted by a running spider:
def connect_to_scrape(self, callback):
self._connect_to_signal(callback, signals.item_scraped)
def _connect_to_signal(self, callback, signal):
if signal is signals.item_scraped:
def _callback(item, response, sender, signal, spider):
scrape_config = response.meta['scrape_config']
delivery_tag = scrape_config.delivery_tag
callback(item.to_dict(), delivery_tag)
else:
_callback = callback
dispatcher.connect(_callback, signal=signal)
So the worker provides a layer of "work deserialization" for the Rabbit client, who doesn't know about spiders, responses, senders, signals, items (anything about the nature of the work itself) - only dicts that'll be published as JSON with their delivery tags.
So the callback below isn't registering properly (no errors either):
def publish(self, item, delivery_tag):
self.log('item_scraped={0} {1}'.format(item, delivery_tag))
publish_message = json.dumps(item)
self._channel.basic_publish(exchange=self.publish_exchange,
routing_key=self.publish_key,
body=publish_message)
self._channel.basic_ack(delivery_tag=delivery_tag)
But if I remove the if branch in _connect_to_signal and connect the callback directly (and modify publish to soak up all the unnecessary arguments), it works.
Anyone have any ideas why?
So, I figured out why this wasn't working, by re-stating it in a more general context:
import functools
from scrapy.signalmanager import SignalManager
SIGNAL = object()
class Sender(object):
def __init__(self):
self.signals = SignalManager(self)
def wrap_receive(self, receive):
#functools.wraps(receive)
def wrapped_receive(message, data):
message = message.replace('World', 'Victor')
value = data['key']
receive(message, value)
return wrapped_receive
def bind(self, receive):
_receive = self.wrap_receive(receive)
self.signals.connect(_receive, signal=SIGNAL,
sender=self, weak=False)
def send(self):
message = 'Hello, World!'
data = {'key': 'value'}
self.signals.send_catch_log(SIGNAL, message=message, data=data)
class Receiver(object):
def __init__(self, sender):
self.sender = sender
self.sender.bind(self.receive)
def receive(self, message, value):
"""Receive data from a Sender."""
print 'Receiver received: {0} {1}.'.format(message, value)
if __name__ == '__main__':
sender = Sender()
receiver = Receiver(sender)
sender.send()
This works if and only if weak=False.
The basic problem is that when connecting to the signal, weak=False needs to be specified. Hopefully someone smarter than me can expound on why that's needed.

Django celery task keep global state

I am currently developing a Django application based on django-tenants-schema. You don't need to look into the actual code of the module, but the idea is that it has a global setting for the current database connection defining which schema to use for the application tenant, e.g.
tenant = tenants_schema.get_tenant()
And for setting
tenants_schema.set_tenant(xxx)
For some of the tasks I would like them to remember the current global tenant selected during the instantiation, e.g. in theory:
class AbstractTask(Task):
'''
Run this method before returning the task future
'''
def before_submit(self):
self.run_args['tenant'] = tenants_schema.get_tenant()
'''
This method is run before related .run() task method
'''
def before_run(self):
tenants_schema.set_tenant(self.run_args['tenant'])
Is there an elegant way of doing it in celery?
Celery (as of 3.1) has signals you can hook into to do this. You can alter the kwargs that were passed in, and on the other side, undo your alterations before they're given to the actual task:
from celery import shared_task
from celery.signals import before_task_publish, task_prerun, task_postrun
from threading import local
current_tenant = local()
#before_task_publish.connect
def add_tenant_to_task(body=None, **unused):
body['kwargs']['tenant_middleware.tenant'] = getattr(current_tenant, 'id', None)
print 'sending tenant: {t}'.format(t=current_tenant.id)
#task_prerun.connect
def extract_tenant_from_task(kwargs=None, **unused):
tenant_id = kwargs.pop('tenant_middleware.tenant', None)
current_tenant.id = tenant_id
print 'current_tenant.id set to {t}'.format(t=tenant_id)
#task_postrun.connect
def cleanup_tenant(**kwargs):
current_tenant.id = None
print 'cleaned current_tenant.id'
#shared_task
def get_current_tenant():
# Here is where you would do work that relied on current_tenant.id being set.
import time
time.sleep(1)
return current_tenant.id
And if you run the task (not showing logging from the worker):
In [1]: current_tenant.id = 1234; ct = get_current_tenant.delay(); current_tenant.id = 5678; ct.get()
sending tenant: 1234
Out[1]: 1234
In [2]: current_tenant.id
Out[2]: 5678
The signals are not called if no message is sent (when you call the task function directly, without delay() or apply_async()). If you want to filter on the task name, it is available as body['task'] in the before_task_publish signal handler, and the task object itself is available in the task_prerun and task_postrun handlers.
I am a Celery newbie, so I can't really tell if this is the "blessed" way of doing "middleware"-type stuff in Celery, but I think it will work for me.
I'm not sure what you mean here, is before_submit executed before the task is called by a client?
In that case I would rather use a with statement here:
from contextlib import contextmanager
#contextmanager
def set_tenant_db(tenant):
prev_tenant = tenants_schema.get_tenant()
try:
tenants_scheme.set_tenant(tenant)
yield
finally:
tenants_schema.set_tenant(prev_tenant)
#app.task
def tenant_task(tenant=None):
with set_tenant_db(tenant):
do_actions_here()
tenant_task.delay(tenant=tenants_scheme.get_tenant())
You can of course create a base task that does this automatically,
you can apply the context in Task.__call__ for example, but I'm not sure
if that saves you much if you can just use the with statement explicitly.