Psycopg2 & Flask - tying connection to before_request & teardown_appcontext - flask

Cheers guys,
refactoring my Flask app I got stuck at tying the db connection to #app.before_request and closing it at #app.teardown_appcontext. I am using plain Psycopg2 and the app factory pattern.
First I created a function to call wihtin the app factory so I could use #app as suggested by Miguel Grinberg here:
def create_app(test_config=None):
app = Flask(__name__, instance_relative_config=True)
--
from shop.db import connect_and_close_db
connect_and_close_db(app)
--
return app
Then I tried this pattern suggested on http://flask.pocoo.org/docs/1.0/appcontext/#storing-data:
def connect_and_close_db(app):
#app.before_request
def get_db_test():
conn_string = "dbname=testdb user=testuser password=test host=localhost"
if 'db' not in g:
g.db = psycopg2.connect(conn_string)
return g.db
#app.teardown_appcontext
def close_connection(exception):
db = g.pop('db', None)
if db is not None:
db.close()
It resulted in:
TypeError: 'psycopg2.extensions.connection' object is not callable
Anyone has an idea what happend and how to make it work?
Furthermore I wonder how I would access the connection object for creating a cursor once its creation is tied to before_request?

This solution is probably far from perfect, and it's not really DRY. I'd welcome comments, or other answers that build on this.
To implement for raw psycopg2 support, you probably need to take a look at the connection pooler. There's also a good guide on how to implement this outwith Flask.
The basic idea is to create your connection pool first. You want this to be established when the flask application initializes (This could within the python interpreter or via gunicorn worker of which there may be several - in which case each worker has its own connection pool). I chose to store the returned pool in the config:
from flask import Flask, g, jsonify
import psycopg2
from psycopg2 import pool
app = Flask(__name__)
app.config['postgreSQL_pool'] = psycopg2.pool.SimpleConnectionPool(1, 20,
user = "postgres",
password = "very_secret",
host = "127.0.0.1",
port = "5432",
database = "postgres")
Note the first two arguments to SimpleConnectionPool are the min & max connections. That's the number of connections going to your database server, bwtween 1 & 20 in this case.
Next define a get_db function:
def get_db():
if 'db' not in g:
g.db = app.config['postgreSQL_pool'].getconn()
return g.db
The SimpleConnectionPool.getconn() method used here simply returns a connection from the pool, which we assign to g.db and return. This means when we call get_db() anywhere in the code it returns the same connection, or creates a connection if not present. There's no need for a before.context decorator.
Do define your teardown function:
#app.teardown_appcontext
def close_conn(e):
db = g.pop('db', None)
if db is not None:
app.config['postgreSQL_pool'].putconn(db)
This runs when the application context is destroyed, and uses SimpleConnectionPool.putconn() to put away the connection.
Finally define a route:
#app.route('/')
def index():
db = get_db()
cursor = db.cursor()
cursor.execute("select 1;")
result = cursor.fetchall()
print (result)
cursor.close()
return jsonify(result)
This code works for me tested against postgres runnning in a docker container. A few things which probably should be improved:
This view isn't very DRY. Perhaps you could move some of this into the get_db function so it returns a cursor. (!!!)
When the python interpreter exits, you should also find away to close the connection with app.config['postgreSQL_pool'].closeall
Although tested some kind of way to monitor the pool would be good, so that you could watch pool/db connections under load and make sure the pooler behaves as expected.
(!!!)In another land, the sqlalchemy.scoped_session documentation explains more things relating to this, with some theory on how its 'sessions' work in relation to requests. They have implemented it in such a way that you can call Session.query('SELECT 1') and it will create the session if it doesn't already exist.
EDIT: Here's a gist with your app factory pattern, and sample usage in the comment.

Currently I am using this pattern:
(I ll edit this answer eventually if I came up with better solution)
This is main script in which we use database. It uses two functions from config: get_db() to get connection from pool and put_db() to return connection into pool:
from config import get_db, put_db
from threading import Thread
from time import sleep
def select():
db = get_db()
sleep(1)
cursor = db.cursor()
# Print select result and db connection address in memory
# To see if it gets connection from another addreess on second thread
cursor.execute("SELECT 'It works %s'", (id(db),))
print(cursor.fetchone())
cursor.close()
put_db(db)
Thread(target=select).start()
Thread(target=select).start()
print('Main thread')
This is config.py:
import sys
import os
import psycopg2
from psycopg2 import pool
from dotenv import load_dotenv, find_dotenv
load_dotenv(find_dotenv())
def get_db(key=None):
return getattr(get_db, 'pool').getconn(key)
def put_db(conn, key=None):
getattr(get_db, 'pool').putconn(conn, key=key)
# So we here need to init connection pool in main thread in order everything to work
# Pool is initialized under function object get_db
try:
setattr(get_db, 'pool', psycopg2.pool.ThreadedConnectionPool(1, 20, os.getenv("DB")))
print(color.red('Initialized db'))
except psycopg2.OperationalError as e:
print(e)
sys.exit(0)
And also if you are curious there is an .env file containing db connection string in DB env variable:
DB="dbname=postgres user=postgres password=1234 host=127.0.0.1 port=5433"
(.env file is loaded using dotenv module in config.py)

Related

xlwings + Django: how to not loose a connection

I am trying to deploy a spreadsheet model with a web page front end using Django. The web "app" flow is simple:
User enters data in a web form
Send form data to a Django backend view function "run_model(request)"
Parse request object to get user inputs and then populate named ranges in the excel model's input sheet using xlwings to interact with a spreadsheet model (sheet.range function is used)
Run "calculate()" on the spreadsheet
Read outputs from another tab in the spreadsheet using xlwings and named ranges (again, using sheet.range function).
The problem is that the connection to the Excel process keeps getting killed by Django (I believe it handles each request as a separate process), so I can get at most one request to work (by importing xlwings inside the view function) but when I send a second request, the connection is dead and it won't reactivate.
Basically, how can I keep the connection to the workbook alive between requests or at least re-open a connection for each request?
Ok, ended up implementing a simple "spreadsheet server" to address the issue with Django killing the connection.
First wrote code for the server (server.py), then some code to start it up from command line args (start_server.py), then had my view open a connection to this model when it needs to use it (in views.py).
So, I had to separate my (Excel + xlwings) and Django into independent processes to keep the interfaces clean and control how much access Django has to my speadsheet model. Works fine now.
start_server.py
"""
Starts spreadsheet server on specified port
Usage: python start_server.py port_number logging_filepath
port_number: sets server listening to localhost:<port_number>
logging_filepath: full path to logging file (all messages directed to this file)
"""
import argparse
import os
import subprocess
_file_path = os.path.dirname(os.path.abspath(__file__))
#command line interface
parser = argparse.ArgumentParser()
parser.add_argument('port_number',
help='sets server listening to localhost:<port_number>')
parser.add_argument('logging_filepath',help='full path to logging file (all messages directed to this file)')
args = parser.parse_args()
#set up logging
_logging_path = args.logging_filepath
print("logging output to " + _logging_path)
_log = open(_logging_path,'wb')
#set up and start server
_port = args.port_number
print('starting Excel server...')
subprocess.Popen(['python',_file_path +
'\\server.py',str(_port)],stdin=_log, stdout=_log, stderr=_log)
print("".join(['server listening on localhost:',str(_port)]))
server.py
"""
Imports package that starts Excel process (using xlwings), gets interface
to the object wrapper for the Excel model, and then serves requests to that model.
"""
import os
import sys
from multiprocessing.connection import Listener
_file_path = os.path.dirname(os.path.abspath(__file__))
sys.path.append(_file_path)
import excel_object_wrapper
_MODEL_FILENAME = 'excel_model.xls'
model_interface = excel_object_wrapper.get_model_interface(_file_path+"\\"+_MODEL_FILENAME)
model_connection = model_interface['run_location']
close_model = model_interface['close_model']
_port = sys.argv[1]
address = ('localhost', int(_port))
listener = Listener(address)
_alive = True
print('starting server on ' + str(address))
while _alive:
print("listening for connections")
conn = listener.accept()
print 'connection accepted from', listener.last_accepted
while True:
try:
input = conn.recv()
print(input)
if not input or input=='close':
print('closing connection to ' + str(conn))
conn.close()
break
if input == 'kill':
_alive = False
print('stopping server')
close_model()
conn.send('model closed')
conn.close()
listener.close()
break
except EOFError:
print('closing connection to ' + str(conn))
conn.close()
break
conn.send(model_connection(*input))
views.py (from within Django)
from __future__ import unicode_literals
import os
from multiprocessing.connection import Client
from django.shortcuts import render
from django.http import HttpResponse
def run(request):
model_connection = Client(('localhost',6000)) #we started excel server on localhost:6000 before starting Django
params = request.POST
param_a = float(params['a'])
param_b = float(params['b'])
model_connection.send((param_a ,param_b ))
results = model_connection.recv()
return render(request,'model_app/show_results.html',context={'results':results})

celery, flask sqlalchemy: DatabaseError: (DatabaseError) SSL error: decryption failed or bad record mac

Hi I have a setup where I'm using Celery Flask SqlAlchemy and I am intermittently getting this error:
(psycopg2.DatabaseError) SSL error: decryption failed or bad record mac
I followed this post:
Celery + SQLAlchemy : DatabaseError: (DatabaseError) SSL error: decryption failed or bad record mac
and also a few more and added a prerun and postrun methods:
#task_postrun.connect
def close_session(*args, **kwargs):
# Flask SQLAlchemy will automatically create new sessions for you from
# a scoped session factory, given that we are maintaining the same app
# context, this ensures tasks have a fresh session (e.g. session errors
# won't propagate across tasks)
d.session.remove()
#task_prerun.connect
def on_task_init(*args, **kwargs):
d.engine.dispose()
But I'm still seeing this error. Anyone solved this?
Note that I'm running this on AWS (with two servers accessing same database). Database itself is hosted on it's own server (not RDS). I believe the total celery background tasks running are 6 (2+4). Flask frontend is running using gunicorn.
My related thread:
https://github.com/celery/celery/issues/3238#issuecomment-225975220
Here is my comment along with additional information:
I use Celery, SQLAlchemy and PostgreSQL on AWS and there is no such problem.
The only difference I can think of is that I have the database on RDS.
I think you can try switching to RDS temporary, just to test if the
issue will be still present or not. If it disappered with RDS then
you'll need to look into PostgreSQL settings.
According to the RDS paramters, I have SSL enabled:
ssl = 1, Enables SSL connections.
ssl_ca_file = /rdsdbdata/rds-metadata/ca-cert.pem
ssl_cert_file = /rdsdbdata/rds-metadata/server-cert.pem
ssl_ciphers = false, Sets the list of allowed SSL ciphers.
ssl_key_file = /rdsdbdata/rds-metadata/server-key.pem
ssl_renegotiation_limit = 0, integer, (kB) Set the amount of traffic to send and receive before renegotiating the encryption keys.
As for Celery initialization code, it is roughly this
from sqlalchemy.orm import scoped_session
from sqlalchemy.orm import sessionmaker
import sqldb
engine = sqldb.get_engine()
cached_data = None
def do_the_work():
global engine, ruckus_data
if cached_data is not None:
return cached_data
db_session = None
try:
db_session = scoped_session(sessionmaker(
autocommit=False, autoflush=False, bind=engine))
data = sqldb.get_session().query(
sqldb.system.MyModel).filter_by(
my_type = sqldb.system.MyModel.TYPEA).all()
cached_data = {}
for row in data:
... # put row into cached_data
finally:
if db_session is not None:
db_session.remove()
return cached_data
This do_the_work function is then called from the celery task.
The sqldb.get_engine looks like this:
from sqlalchemy import create_engine
_engine = None
def get_engine():
global _engine
if _engine:
return _engine
_engine = create_engine(config.SQL_DB_URL, echo=config.SQL_DB_ECHO)
return _engine
Finally, the SQL_DB_URI and SQL_DB_ECHO in the config module are these:
SQL_DB_URL = 'postgresql+psycopg2://%s:%s#%s/%s' % (
POSTGRES_USER, POSTGRES_PASSWORD, POSTGRES_HOST, POSTGRES_DB_NAME)
SQL_DB_ECHO = False

Django testing of neo4j database

I'm using django with neo4j as database and noemodel as OGM. How do I test it?
When I run python3 manage.py test all the changes, my tests make are left.
And also how do I make two databases, one for testing, another for working in production and specify which one to use how?
I assume the reason all of your changes are being retained is due to using the same neo4j database for testing as you are using in development. Since neomodel isn't integrated tightly with Django it doesn't act the same way Django's ORM does when testing. Django will do some helpful things when you run tests using its ORM, such as creating a test database that will be destroyed upon completion.
With neo4j and neomodel I'd recommend doing the following:
Create a Custom Test Runner
Django enables you to define a custom test runner by setting the TEST_RUNNER settings variable. An extremely simple version of this to get you going would be:
from time import sleep
from subprocess import call
from django.test.runner import DiscoverRunner
class MyTestRunner(DiscoverRunner):
def setup_databases(self, *args, **kwargs):
# Stop your development instance
call("sudo service neo4j-service stop", shell=True)
# Sleep to ensure the service has completely stopped
sleep(1)
# Start your test instance (see section below for more details)
success = call("/path/to/test/db/neo4j-community-2.2.2/bin/neo4j"
" start-no-wait", shell=True)
# Need to sleep to wait for the test instance to completely come up
sleep(10)
if success != 0:
return False
try:
# For neo4j 2.2.x you'll need to set a password or deactivate auth
# Nigel Small's py2neo gives us an easy way to accomplish this
call("source /path/to/virtualenv/bin/activate && "
"/path/to/virtualenv/bin/neoauth "
"neo4j neo4j my-p4ssword")
except OSError:
pass
# Don't import neomodel until we get here because we need to wait
# for the new db to be spawned
from neomodel import db
# Delete all previous entries in the db prior to running tests
query = "match (n)-[r]-() delete n,r"
db.cypher_query(query)
super(MyTestRunner, self).__init__(*args, **kwargs)
def teardown_databases(self, old_config, **kwargs):
from neomodel import db
# Delete all previous entries in the db after running tests
query = "match (n)-[r]-() delete n,r"
db.cypher_query(query)
sleep(1)
# Shut down test neo4j instance
success = call("/path/to/test/db/neo4j-community-2.2.2/bin/neo4j"
" stop", shell=True)
if success != 0:
return False
sleep(1)
# start back up development instance
call("sudo service neo4j-service start", shell=True)
Add a secondary neo4j database
This can be done in a couple ways but to follow along with the test runner above you can download a community distribution from neo4j's website. With this secondary instance you can now swap between which database you'd like to use utilizing the command line statements used in the calls within the test runner.
Wrap Up
This solution assume's you're on a linux box but should be portable to a different OS with minor modifications. Also I'd recommend checking out the Django's Test Runner Docs to expand upon what the test runner can do.
There currently isn't mechanism for working with test databases in neomodel as neo4j only has 1 schema per instance.
However you can override the environment variable NEO4J_REST_URL when running the tests like so
export NEO4J_REST_URL=http://localhost:7473/db/data python3 manage.py test
The way I went about this was to give in and use the existing database, but mark all test-related nodes and detach/delete them when finished. It's obviously not ideal; all your node classes must inherit from NodeBase or risk polluting the db with test data, and if you have unique constraints, those will still be enforced across both live/test data. But it works for my purposes, and I thought I'd share in case it helps someone else.
in myproject/base.py:
from neomodel.properties import Property, validator
from django.conf import settings
class TestModeProperty(Property):
"""
Boolean property that is only set during unit testing.
"""
#validator
def inflate(self, value):
return bool(value)
#validator
def deflate(self, value):
return bool(value)
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.default = True
self.has_default = settings.UNIT_TESTING
class NodeBase(StructuredNode):
__abstract_node__ = True
test_mode = TestModeProperty()
in myproject/test_runner.py:
from django.test.runner import DiscoverRunner
from neomodel import db
class NeoDiscoverRunner(DiscoverRunner):
def teardown_databases(self, old_config, **kwargs):
db.cypher_query(
"""
MATCH (node {test_mode: true})
DETACH DELETE node
"""
)
return super().teardown_databases(old_config, **kwargs)
in settings.py:
UNIT_TESTING = sys.argv[1:2] == ["test"]
TEST_RUNNER = "myproject.test_runner.NeoDiscoverRunner"

Django celery task keep global state

I am currently developing a Django application based on django-tenants-schema. You don't need to look into the actual code of the module, but the idea is that it has a global setting for the current database connection defining which schema to use for the application tenant, e.g.
tenant = tenants_schema.get_tenant()
And for setting
tenants_schema.set_tenant(xxx)
For some of the tasks I would like them to remember the current global tenant selected during the instantiation, e.g. in theory:
class AbstractTask(Task):
'''
Run this method before returning the task future
'''
def before_submit(self):
self.run_args['tenant'] = tenants_schema.get_tenant()
'''
This method is run before related .run() task method
'''
def before_run(self):
tenants_schema.set_tenant(self.run_args['tenant'])
Is there an elegant way of doing it in celery?
Celery (as of 3.1) has signals you can hook into to do this. You can alter the kwargs that were passed in, and on the other side, undo your alterations before they're given to the actual task:
from celery import shared_task
from celery.signals import before_task_publish, task_prerun, task_postrun
from threading import local
current_tenant = local()
#before_task_publish.connect
def add_tenant_to_task(body=None, **unused):
body['kwargs']['tenant_middleware.tenant'] = getattr(current_tenant, 'id', None)
print 'sending tenant: {t}'.format(t=current_tenant.id)
#task_prerun.connect
def extract_tenant_from_task(kwargs=None, **unused):
tenant_id = kwargs.pop('tenant_middleware.tenant', None)
current_tenant.id = tenant_id
print 'current_tenant.id set to {t}'.format(t=tenant_id)
#task_postrun.connect
def cleanup_tenant(**kwargs):
current_tenant.id = None
print 'cleaned current_tenant.id'
#shared_task
def get_current_tenant():
# Here is where you would do work that relied on current_tenant.id being set.
import time
time.sleep(1)
return current_tenant.id
And if you run the task (not showing logging from the worker):
In [1]: current_tenant.id = 1234; ct = get_current_tenant.delay(); current_tenant.id = 5678; ct.get()
sending tenant: 1234
Out[1]: 1234
In [2]: current_tenant.id
Out[2]: 5678
The signals are not called if no message is sent (when you call the task function directly, without delay() or apply_async()). If you want to filter on the task name, it is available as body['task'] in the before_task_publish signal handler, and the task object itself is available in the task_prerun and task_postrun handlers.
I am a Celery newbie, so I can't really tell if this is the "blessed" way of doing "middleware"-type stuff in Celery, but I think it will work for me.
I'm not sure what you mean here, is before_submit executed before the task is called by a client?
In that case I would rather use a with statement here:
from contextlib import contextmanager
#contextmanager
def set_tenant_db(tenant):
prev_tenant = tenants_schema.get_tenant()
try:
tenants_scheme.set_tenant(tenant)
yield
finally:
tenants_schema.set_tenant(prev_tenant)
#app.task
def tenant_task(tenant=None):
with set_tenant_db(tenant):
do_actions_here()
tenant_task.delay(tenant=tenants_scheme.get_tenant())
You can of course create a base task that does this automatically,
you can apply the context in Task.__call__ for example, but I'm not sure
if that saves you much if you can just use the with statement explicitly.

Scale Gevent Socketio

I currently have a site setup using Django. I have added Gevent Socketio to add a chat function. I have a need to scale it as there are quite a few users already on the site and can't find a way to do so.
I tried https://github.com/abourget/gevent-socketio/tree/master/examples/django_chat/chat
I am using Gunicorn & the socketio.sgunicorn.GeventSocketIOWorker worker class so at first I thought of increasing the worker count. Unfortunately this seems to fail intermittently. I have started rewriting it to use redis from a few sources I found and have 1 worker on each server which is now being load balanced. However this seems to have the same problem. I am wondering if there is some issue in the gevent socketio code itself which does not allow it to scale.
Here is how I have started which is just the submit message code.
def redis_client():
"""Get a redis client."""
return Redis(settings.REDIS_HOST, settings.REDIS_PORT, settings.REDIS_DB)
class PubSub(object):
"""
Very simple Pub/Sub pattern wrapper
using simplified Redis Pub/Sub functionality.
Usage (publisher)::
import redis
r = redis.Redis()
q = PubSub(r, "channel")
q.publish("test data")
Usage (listener)::
import redis
r = redis.Redis()
q = PubSub(r, "channel")
def handler(data):
print "Data received: %r" % data
q.subscribe(handler)
"""
def __init__(self, redis, channel="default"):
self.redis = redis
self.channel = channel
def publish(self, data):
self.redis.publish(self.channel, simplejson.dumps(data))
def subscribe(self, handler):
redis = self.redis.pubsub()
redis.subscribe(self.channel)
for data_raw in redis.listen():
if data_raw['type'] != "message":
continue
data = simplejson.loads(data_raw["data"])
handler(data)
from socketio.namespace import BaseNamespace
from socketio.sdjango import namespace
from supremo.utils import redis_client, PubSub
from gevent import Greenlet
#namespace('/chat')
class ChatNamespace(BaseNamespace):
nicknames = []
r = redis_client()
q = PubSub(r, "channel")
def initialize(self):
# Setup redis listener
def handler(data):
self.emit('receive_message',data)
greenlet = Greenlet.spawn(self.q.subscribe, handler)
def on_submit_message(self,msg):
self.q.publish(msg)
I used parts of code from https://github.com/fcurella/django-push-demo and gevent-socketio 0.3.5rc1 instead of rc2 and it is working now with multiple workers and load balancing.