xlwings + Django: how to not loose a connection - django

I am trying to deploy a spreadsheet model with a web page front end using Django. The web "app" flow is simple:
User enters data in a web form
Send form data to a Django backend view function "run_model(request)"
Parse request object to get user inputs and then populate named ranges in the excel model's input sheet using xlwings to interact with a spreadsheet model (sheet.range function is used)
Run "calculate()" on the spreadsheet
Read outputs from another tab in the spreadsheet using xlwings and named ranges (again, using sheet.range function).
The problem is that the connection to the Excel process keeps getting killed by Django (I believe it handles each request as a separate process), so I can get at most one request to work (by importing xlwings inside the view function) but when I send a second request, the connection is dead and it won't reactivate.
Basically, how can I keep the connection to the workbook alive between requests or at least re-open a connection for each request?

Ok, ended up implementing a simple "spreadsheet server" to address the issue with Django killing the connection.
First wrote code for the server (server.py), then some code to start it up from command line args (start_server.py), then had my view open a connection to this model when it needs to use it (in views.py).
So, I had to separate my (Excel + xlwings) and Django into independent processes to keep the interfaces clean and control how much access Django has to my speadsheet model. Works fine now.
start_server.py
"""
Starts spreadsheet server on specified port
Usage: python start_server.py port_number logging_filepath
port_number: sets server listening to localhost:<port_number>
logging_filepath: full path to logging file (all messages directed to this file)
"""
import argparse
import os
import subprocess
_file_path = os.path.dirname(os.path.abspath(__file__))
#command line interface
parser = argparse.ArgumentParser()
parser.add_argument('port_number',
help='sets server listening to localhost:<port_number>')
parser.add_argument('logging_filepath',help='full path to logging file (all messages directed to this file)')
args = parser.parse_args()
#set up logging
_logging_path = args.logging_filepath
print("logging output to " + _logging_path)
_log = open(_logging_path,'wb')
#set up and start server
_port = args.port_number
print('starting Excel server...')
subprocess.Popen(['python',_file_path +
'\\server.py',str(_port)],stdin=_log, stdout=_log, stderr=_log)
print("".join(['server listening on localhost:',str(_port)]))
server.py
"""
Imports package that starts Excel process (using xlwings), gets interface
to the object wrapper for the Excel model, and then serves requests to that model.
"""
import os
import sys
from multiprocessing.connection import Listener
_file_path = os.path.dirname(os.path.abspath(__file__))
sys.path.append(_file_path)
import excel_object_wrapper
_MODEL_FILENAME = 'excel_model.xls'
model_interface = excel_object_wrapper.get_model_interface(_file_path+"\\"+_MODEL_FILENAME)
model_connection = model_interface['run_location']
close_model = model_interface['close_model']
_port = sys.argv[1]
address = ('localhost', int(_port))
listener = Listener(address)
_alive = True
print('starting server on ' + str(address))
while _alive:
print("listening for connections")
conn = listener.accept()
print 'connection accepted from', listener.last_accepted
while True:
try:
input = conn.recv()
print(input)
if not input or input=='close':
print('closing connection to ' + str(conn))
conn.close()
break
if input == 'kill':
_alive = False
print('stopping server')
close_model()
conn.send('model closed')
conn.close()
listener.close()
break
except EOFError:
print('closing connection to ' + str(conn))
conn.close()
break
conn.send(model_connection(*input))
views.py (from within Django)
from __future__ import unicode_literals
import os
from multiprocessing.connection import Client
from django.shortcuts import render
from django.http import HttpResponse
def run(request):
model_connection = Client(('localhost',6000)) #we started excel server on localhost:6000 before starting Django
params = request.POST
param_a = float(params['a'])
param_b = float(params['b'])
model_connection.send((param_a ,param_b ))
results = model_connection.recv()
return render(request,'model_app/show_results.html',context={'results':results})

Related

Psycopg2 & Flask - tying connection to before_request & teardown_appcontext

Cheers guys,
refactoring my Flask app I got stuck at tying the db connection to #app.before_request and closing it at #app.teardown_appcontext. I am using plain Psycopg2 and the app factory pattern.
First I created a function to call wihtin the app factory so I could use #app as suggested by Miguel Grinberg here:
def create_app(test_config=None):
app = Flask(__name__, instance_relative_config=True)
--
from shop.db import connect_and_close_db
connect_and_close_db(app)
--
return app
Then I tried this pattern suggested on http://flask.pocoo.org/docs/1.0/appcontext/#storing-data:
def connect_and_close_db(app):
#app.before_request
def get_db_test():
conn_string = "dbname=testdb user=testuser password=test host=localhost"
if 'db' not in g:
g.db = psycopg2.connect(conn_string)
return g.db
#app.teardown_appcontext
def close_connection(exception):
db = g.pop('db', None)
if db is not None:
db.close()
It resulted in:
TypeError: 'psycopg2.extensions.connection' object is not callable
Anyone has an idea what happend and how to make it work?
Furthermore I wonder how I would access the connection object for creating a cursor once its creation is tied to before_request?
This solution is probably far from perfect, and it's not really DRY. I'd welcome comments, or other answers that build on this.
To implement for raw psycopg2 support, you probably need to take a look at the connection pooler. There's also a good guide on how to implement this outwith Flask.
The basic idea is to create your connection pool first. You want this to be established when the flask application initializes (This could within the python interpreter or via gunicorn worker of which there may be several - in which case each worker has its own connection pool). I chose to store the returned pool in the config:
from flask import Flask, g, jsonify
import psycopg2
from psycopg2 import pool
app = Flask(__name__)
app.config['postgreSQL_pool'] = psycopg2.pool.SimpleConnectionPool(1, 20,
user = "postgres",
password = "very_secret",
host = "127.0.0.1",
port = "5432",
database = "postgres")
Note the first two arguments to SimpleConnectionPool are the min & max connections. That's the number of connections going to your database server, bwtween 1 & 20 in this case.
Next define a get_db function:
def get_db():
if 'db' not in g:
g.db = app.config['postgreSQL_pool'].getconn()
return g.db
The SimpleConnectionPool.getconn() method used here simply returns a connection from the pool, which we assign to g.db and return. This means when we call get_db() anywhere in the code it returns the same connection, or creates a connection if not present. There's no need for a before.context decorator.
Do define your teardown function:
#app.teardown_appcontext
def close_conn(e):
db = g.pop('db', None)
if db is not None:
app.config['postgreSQL_pool'].putconn(db)
This runs when the application context is destroyed, and uses SimpleConnectionPool.putconn() to put away the connection.
Finally define a route:
#app.route('/')
def index():
db = get_db()
cursor = db.cursor()
cursor.execute("select 1;")
result = cursor.fetchall()
print (result)
cursor.close()
return jsonify(result)
This code works for me tested against postgres runnning in a docker container. A few things which probably should be improved:
This view isn't very DRY. Perhaps you could move some of this into the get_db function so it returns a cursor. (!!!)
When the python interpreter exits, you should also find away to close the connection with app.config['postgreSQL_pool'].closeall
Although tested some kind of way to monitor the pool would be good, so that you could watch pool/db connections under load and make sure the pooler behaves as expected.
(!!!)In another land, the sqlalchemy.scoped_session documentation explains more things relating to this, with some theory on how its 'sessions' work in relation to requests. They have implemented it in such a way that you can call Session.query('SELECT 1') and it will create the session if it doesn't already exist.
EDIT: Here's a gist with your app factory pattern, and sample usage in the comment.
Currently I am using this pattern:
(I ll edit this answer eventually if I came up with better solution)
This is main script in which we use database. It uses two functions from config: get_db() to get connection from pool and put_db() to return connection into pool:
from config import get_db, put_db
from threading import Thread
from time import sleep
def select():
db = get_db()
sleep(1)
cursor = db.cursor()
# Print select result and db connection address in memory
# To see if it gets connection from another addreess on second thread
cursor.execute("SELECT 'It works %s'", (id(db),))
print(cursor.fetchone())
cursor.close()
put_db(db)
Thread(target=select).start()
Thread(target=select).start()
print('Main thread')
This is config.py:
import sys
import os
import psycopg2
from psycopg2 import pool
from dotenv import load_dotenv, find_dotenv
load_dotenv(find_dotenv())
def get_db(key=None):
return getattr(get_db, 'pool').getconn(key)
def put_db(conn, key=None):
getattr(get_db, 'pool').putconn(conn, key=key)
# So we here need to init connection pool in main thread in order everything to work
# Pool is initialized under function object get_db
try:
setattr(get_db, 'pool', psycopg2.pool.ThreadedConnectionPool(1, 20, os.getenv("DB")))
print(color.red('Initialized db'))
except psycopg2.OperationalError as e:
print(e)
sys.exit(0)
And also if you are curious there is an .env file containing db connection string in DB env variable:
DB="dbname=postgres user=postgres password=1234 host=127.0.0.1 port=5433"
(.env file is loaded using dotenv module in config.py)

Start and Stop a periodically background Task with Django

I would like to make a bitcoin notification with Django. If managed to have a working Telegram bot that send the bitcoin stat when I ask him to do so. Now I would like him to send me a message if bitcoin reaches a specific value. There are some tutorials with running python script on server but not with Django. I read some answers and descriptions about django channels but couldn't adapt them to my project.
I would like to send, by telegram, a command about the amount and duration. Django would then start a process with these values and values of the channel I'm sending from in the background. If now, within the duration, the amount is reached, Django sends a message back to my channel. This should also be possible for more than one person.
Is these possible to do with Django out of the box, maybe with decorators, or do I need django-channels or something else?
Edit 2018-08-10:
Maybe my code explains a little bit better what I want to do.
import requests
import json
from datetime import datetime
from django.shortcuts import render
from django.http import HttpResponse
from django.conf import settings
from django.views.generic import TemplateView
from django.views.decorators.csrf
import csrf_exempt
class AboutView(TemplateView):
template_name = 'telapi/about.html'
bot_token = settings.BOT_TOKEN
def get_url(method):
return 'https://api.telegram.org/bot{}/{}'.format(bot_token, method)
def process_message(update):
data = {}
data['chat_id'] = update['message']['from']['id']
data['text'] = "I can hear you!"
r = requests.post(get_url('sendMessage'), data=data)
#csrf_exempt
def process_update(request, r_bot_token):
''' Method that is called from telegram-bot'''
if request.method == 'POST' and r_bot_token == bot_token:
update = json.loads(request.body.decode('utf-8'))
if 'message' in update:
if update['message']['text'] == 'give me news':
new_bitcoin_price(update)
else:
process_message(update)
return HttpResponse(status=200)
bitconin_api_uri = 'https://api.coinmarketcap.com/v2/ticker/1/?convert=EUR'
# response = requests.get(bitconin_api_uri)
def get_latest_bitcoin_price():
response = requests.get(bitconin_api_uri)
response_json = response.json()
euro_price = float(response_json['data']['quotes']['EUR']['price'])
timestamp = int(response_json['metadata']['timestamp'])
date = datetime.fromtimestamp(timestamp).strftime('%Y-%m-%d %H:%M:%S')
return euro_price, date
def new_bitcoin_price(update):
data = {}
data['chat_id'] = update['message']['from']['id']
euro_price, date = get_latest_bitcoin_price()
data['text'] = "Aktuel ({}) beträgt der Preis {:.2f}€".format(
date, euro_price)
r = requests.post(get_url('sendMessage'), data=data)
Edit 2018-08-13:
I think the solution would be celery-beat and channels. Does anyone know a good tutorial?
One of my teammates uses django-celery-beat, that is available at https://github.com/celery/django-celery-beat to do this and he gave me some excellent feedback from it. You can schedule the celery tasks using the crontab syntax.
I had same issue, there are several typical approaches: Celery, Django-Channels, etc.
But you can avoid them all with simple approach: https://docs.djangoproject.com/en/2.1/howto/custom-management-commands/
I have used django commands in my project to run periodically tasks to rebuild users statistics:
Implement yourself application command, for example your application name is myapp and you have placed my_periodic_task.py in myapp/management/commands folder, so you can run your task once by typing python manage.py my_periodic_task
place beside manage.py file new file for example background.py with same code:
-
import os
from subprocess import call
BASE = os.path.dirname(__file__)
MANAGE_BASE = os.path.join(BASE, 'manage.py')
while True:
sleep(YOUR_TIMEOUT)
call(['python', MANAGE_BASE , 'my_periodic_task'])
Run your server for example: python background.py & python manage.py runserver 0.0.0.0:8000

TypeError: cannot serialize '_io.TextIOWrapper' object- Flask

I am writing a flask app that asks user to upload excel spreadsheet and then calculate and populate the database.I am trying to do the processing part in the background via Redis RQ but I keep getting TypeError: cannot serialize '_io.TextIOWrapper' object
my code looks like this:
from redis import Redis
from rq import Queue
from rq.job import Job
import xlrd as x
workbook = x.open_workbook('data.xls')
sheet = workbook.sheet_by_index(0)
q = Queue(connection = Redis())
def populate(sheet,row,column):
#extract data and save into database
job = enqueue_call(func=populate, args=(sheet,7,5), result_ttl = 5000)
print(job.get_id())

Automating pulling csv files off google Trends

pyGTrends does not seem to work. Giving errors in Python.
pyGoogleTrendsCsvDownloader seems to work, logs in, but after getting 1-3 requests (per day!) complains about exhausted quota, even though manual download with the same login/IP works flawlessly.
Bottom line: neither work. Searching through stackoverflow: many questions from people trying to pull csv's from Google, but no workable solution I could find...
Thank you in advance: whoever will be able to help. How should the code be changed? Do you know of another solution that works?
Here's the code of pyGoogleTrendsCsvDownloader.py
import httplib
import urllib
import urllib2
import re
import csv
import lxml.etree as etree
import lxml.html as html
import traceback
import gzip
import random
import time
import sys
from cookielib import Cookie, CookieJar
from StringIO import StringIO
class pyGoogleTrendsCsvDownloader(object):
'''
Google Trends Downloader
Recommended usage:
from pyGoogleTrendsCsvDownloader import pyGoogleTrendsCsvDownloader
r = pyGoogleTrendsCsvDownloader(username, password)
r.get_csv(cat='0-958', geo='US-ME-500')
'''
def __init__(self, username, password):
'''
Provide login and password to be used to connect to Google Trends
All immutable system variables are also defined here
'''
# The amount of time (in secs) that the script should wait before making a request.
# This can be used to throttle the downloading speed to avoid hitting servers too hard.
# It is further randomized.
self.download_delay = 0.25
self.service = "trendspro"
self.url_service = "http://www.google.com/trends/"
self.url_download = self.url_service + "trendsReport?"
self.login_params = {}
# These headers are necessary, otherwise Google will flag the request at your account level
self.headers = [('User-Agent', 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:12.0) Gecko/20100101 Firefox/12.0'),
("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"),
("Accept-Language", "en-gb,en;q=0.5"),
("Accept-Encoding", "gzip, deflate"),
("Connection", "keep-alive")]
self.url_login = 'https://accounts.google.com/ServiceLogin?service='+self.service+'&passive=1209600&continue='+self.url_service+'&followup='+self.url_service
self.url_authenticate = 'https://accounts.google.com/accounts/ServiceLoginAuth'
self.header_dictionary = {}
self._authenticate(username, password)
def _authenticate(self, username, password):
'''
Authenticate to Google:
1 - make a GET request to the Login webpage so we can get the login form
2 - make a POST request with email, password and login form input values
'''
# Make sure we get CSV results in English
ck = Cookie(version=0, name='I4SUserLocale', value='en_US', port=None, port_specified=False, domain='www.google.com', domain_specified=False,domain_initial_dot=False, path='/trends', path_specified=True, secure=False, expires=None, discard=False, comment=None, comment_url=None, rest=None)
self.cj = CookieJar()
self.cj.set_cookie(ck)
self.opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(self.cj))
self.opener.addheaders = self.headers
# Get all of the login form input values
find_inputs = etree.XPath("//form[#id='gaia_loginform']//input")
try:
#
resp = self.opener.open(self.url_login)
if resp.info().get('Content-Encoding') == 'gzip':
buf = StringIO( resp.read())
f = gzip.GzipFile(fileobj=buf)
data = f.read()
else:
data = resp.read()
xmlTree = etree.fromstring(data, parser=html.HTMLParser(recover=True, remove_comments=True))
for input in find_inputs(xmlTree):
name = input.get('name')
if name:
name = name.encode('utf8')
value = input.get('value', '').encode('utf8')
self.login_params[name] = value
except:
print("Exception while parsing: %s\n" % traceback.format_exc())
self.login_params["Email"] = username
self.login_params["Passwd"] = password
params = urllib.urlencode(self.login_params)
self.opener.open(self.url_authenticate, params)
def get_csv(self, throttle=False, **kwargs):
'''
Download CSV reports
'''
# Randomized download delay
if throttle:
r = random.uniform(0.5 * self.download_delay, 1.5 * self.download_delay)
time.sleep(r)
params = {
'export': 1
}
params.update(kwargs)
params = urllib.urlencode(params)
r = self.opener.open(self.url_download + params)
# Make sure everything is working ;)
if not r.info().has_key('Content-Disposition'):
print "You've exceeded your quota. Continue tomorrow..."
sys.exit(0)
if r.info().get('Content-Encoding') == 'gzip':
buf = StringIO( r.read())
f = gzip.GzipFile(fileobj=buf)
data = f.read()
else:
data = r.read()
myFile = open('trends_%s.csv' % '_'.join(['%s-%s' % (key, value) for (key, value) in kwargs.items()]), 'w')
myFile.write(data)
myFile.close()
Although I don't know python, I may have a solution. I am currently doing the same thing in C# and though I didn't get the .csv file, I got created a custom URL through code and then downloaded that HTML and saved to a text file (also through code). In this HTML (at line 12) is all the information needed to create the graph that is used on Google Trends. However, this has alot of unnecessary text within it that needs to be cut down. But either way, you end up with the same result. The Google Trends data. I posted a more detailed answer to my question here:
Downloading .csv file from Google Trends
There is an alternative module named pytrends - https://pypi.org/project/pytrends/ It is really cool. I would recommend this.
Example usage:
import numpy as np
import pandas as pd
from pytrends.request import TrendReq
pytrend = TrendReq()
#It is the term that you want to search
pytrend.build_payload(kw_list=["Eminem is the Rap God"])
# Find which region has searched the term
df = pytrend.interest_by_region()
df.to_csv("path\Eminem_InterestbyRegion.csv")
Potentially if you have a list of terms to search you could make use of "for loop" to automate the insights as per your wish.

unique session id for different dash sessions

I would like to create unique session ids for each time a user opens the dash app in browser.
I have been following the tutorial here:
https://dash.plot.ly/sharing-data-between-callbacks
This is my code:
import dash
import dash_html_components as html
import dash_core_components as dcc
import flask
import datetime
import uuid
external_stylesheets = ['https://codepen.io/chriddyp/pen/bWLwgP.css']
app = flask.Flask(__name__)
dash_app = dash.Dash(__name__,server=app,url_base_pathname="/",external_stylesheets=external_stylesheets)
def serve_layout():
session_id = str(uuid.uuid4())
return html.Div([
html.Div(session_id, id='session-id', style={'display': 'none'}),
html.Div(dcc.Input(id="input_session_id",type="text",value=session_id))
])
dash_app.layout = serve_layout()
if __name__ == '__main__':
app.run(host='0.0.0.0', debug=True, port=80)
It seems like the session ids are different if I use different computers but if I use the same computer, it will stay the same.
Is there a way to generate an unique session each time an user opens the url for the dash app?
Obviously problem was in this line: dash_app.layout = serve_layout()
You had to use it without parenthesis:
dash_app.layout = serve_layout
In fact you were assigning not a function, but a result of the function called once at the first page load.