Running a python subprocess in Flask - flask

I have a Flask web app running a crawling process in this fashion:
on terminal tab 1:
$ cd /path/to/scraping
$ scrapyrt
http://scrapyrt.readthedocs.io/en/latest/index.html
on terminal tab 2:
$ python app.pp
and in app.py:
params = {
'spider_name': spider,
'start_requests':True
}
response = requests.get('http://localhost:9080/crawl.json', params)
print ('RESPONSE',response)
data = json.loads(response.text)
which works.
now I'de like to move everthing into app.py, and for that I've tried:
import subprocess
from time import sleep
try:
subprocess.check_output("scrapyrt", shell=True, cwd='path/to/scraping')
except subprocess.CalledProcessError as e:
raise RuntimeError("command '{}' return with error (code {}): {}".format(e.cmd, e.returncode, e.output))
sleep(3)
params = {
'spider_name': spider,
'start_requests':True
}
response = requests.get('http://localhost:9080/crawl.json', params)
print ('RESPONSE',response)
data = json.loads(response.text)
this starts twisted, like so:
2018-02-03 17:29:35-0200 [-] Log opened.
2018-02-03 17:29:35-0200 [-] Site starting on 9080
2018-02-03 17:29:35-0200 [-] Starting factory <twisted.web.server.Site instance at 0x104effa70>
but crawling process hangs and does not go through.
what am I missing here?

Have you considered using a scheduler, such as APScheduler or similar?
You can run code using crons or in intervals, and it integrates well with Flask.
Take a look here:
http://apscheduler.readthedocs.io/en/stable/userguide.html
Here is an example:
from flask import Flask
from apscheduler.schedulers.background import BackgroundScheduler
app = Flask(__name__)
cron = BackgroundScheduler()
cron.start()
#app.route('/')
def index():
return render_template("index.html")
#cron.scheduled_job('interval', minutes=3)
def my_scrapper():
# Do the stuff you need
return True

Related

Scheduled Tasks - Runs without Error but does not produce any output - Django PythonAnywhere

I have setup a scheduled task to run daily on PythonAnywhere.
The task uses the Django Commands as I found this was the preferred method to use with PythonAnywhere.
The tasks produces no errors but I don't get any output. 2022-06-16 22:56:13 -- Completed task, took 9.13 seconds, return code was 0.
I have tried uses Print() to debug areas of the code but I cannot produce any output in either the error or server logs. Even after trying print(date_today, file=sys.stderr).
I have set the path on the Scheduled Task as: (Not sure if this is correct but seems to be the only way I can get it to run without errors.)
workon advancementvenv && python3.8 /home/vicinstofsport/advancement_series/manage.py shell < /home/vicinstofsport/advancement_series/advancement/management/commands/schedule_task.py
I have tried setting the path as below but then it gets an error when I try to import from the models.py file (I know this is related to a relative import but cannot management to resolve it). Traceback (most recent call last): File "/home/vicinstofsport/advancement_series/advancement/management/commands/schedule_task.py", line 3, in <module> from advancement.models import Bookings ModuleNotFoundError: No module named 'advancement'
2022-06-17 03:41:22 -- Completed task, took 14.76 seconds, return code was 1.
Any ideas on how I can get this working? It all works fine locally if I use the command py manage.py scheduled_task just fails on PythonAnywhere.
Below is the task code and structure of the app.
from django.core.management.base import BaseCommand
import requests
from advancement.models import Bookings
from datetime import datetime, timedelta, date
import datetime
from sendgrid import SendGridAPIClient
from sendgrid.helpers.mail import Mail
from django.core.mail import send_mail
import os
from decouple import config
class Command(BaseCommand):
help = 'Sends Program Survey'
def handle(self, *args, **kwargs):
# Get today's date
date_today = datetime.datetime.now().date()
# Get booking data
bookings = Bookings.objects.all()
# For each booking today, send survey email
for booking in bookings:
if booking.booking_date == date_today:
if booking.program_type == "Sport Science":
booking_template_id = 'd-bbc79704a31a4a62a5bfea90f6342b7a'
email = booking.email
booking_message = Mail(from_email=config('FROM_EMAIL'),
to_emails=[email],
)
booking_message.template_id = booking_template_id
try:
sg = SendGridAPIClient(config('SG_API'))
response = sg.send(booking_message)
except Exception as e:
print(e)
else:
booking_template_id = 'd-3167927b3e2146519ff6d9035ab59256'
email = booking.email
booking_message = Mail(from_email=config('FROM_EMAIL'),
to_emails=[email],
)
booking_message.template_id = booking_template_id
try:
sg = SendGridAPIClient(config('SG_API'))
response = sg.send(booking_message)
except Exception as e:
print(e)
else:
print('No')
Thanks in advance for any help.
Thanks Filip and Glenn, testing within the bash console and changing the directory in the task helped to fix the issue. Adding 'cd /home/vicinstofsport/advancement_series && to my task allowed the function to run.'

how to run mqtt in flask application simultaneously over wsgi

I created a flask application that consists of MQTT client which simultaneously subscribe data from esp32 and store it in database.
from flask import Flask, redirect, render_template, request, session, abort,make_response,jsonify, flash, url_for
import paho.mqtt.client as mqtt
import json
import config
import db_access
#mqtt code
def on_message(client, userdata, message):
topic = message.topic
print("line 12 - topic checkpoint - ",topic)
msgDecode=str(message.payload.decode("utf-8","ignore"))
msgJson=json.loads(msgDecode) #decode json data
print("line 15 - json checkpoint - ",type(msgJson))
# deviceID = msgJson["DeviceID"]
# currentCounter = msgJson["Counter"]
# status = msgJson["Status"]
db_access.updateStatus(msgJson["DeviceID"],msgJson["Status"])
app = Flask(__name__,template_folder='templates')
'''Web portal routes'''
#app.route('/device/switch', methods=['POST'])
def switch():
#parameter parsing
deviceID = request.args.get('deviceID')
status = request.args.get('status')
statusMap = {"on":1,"off":0}
#MQTT publish
mqtt_msg = json.dumps({"deviceID":int(deviceID),"status":statusMap[status]})
client.publish(config.MQTT_STATUS_CHANGE_TOPIC,mqtt_msg)
time_over_flag = 0
loop_Counter = 0
while status != db_access.getDeviceStatus(deviceID):
time.sleep(2)
loop_Counter+=1
if loop_Counter ==2:
time_over_flag = 1
break
if time_over_flag:
return make_response(jsonify({"statusChange":False}))
else:
return make_response(jsonify({"statusChange":True}))
if __name__ == "__main__":
db_access.createUserTable()
db_access.insertUserData()
db_access.createDeviceTable()
db_access.insertDeviceData()
print("creating new instance")
client = mqtt.Client("server") #create new instance
client.on_message=on_message #attach function to callback
print("connecting to broker")
client.connect(config.MQTT_BROKER_ADDRESS)
client.loop_start()
print("Subscribing to topic","esp/#")
client.subscribe("esp/#")
app.run(debug=True, use_reloader=False)
This is code in ____init____.py
db_access.py is consist of database operations and config.py consist of configurations.
Will this work in apache?
also, I have no previous experience with WSGI
The problem with launching your code (as included) with a WSGI server, is the part in that last if block, only gets run when you execute that file directly with the python command.
To make this work, I'd try moving that block of code to the top of your file, around here:
app = Flask(__name__,template_folder='templates')
db_access.createUserTable()
db_access.insertUserData()
db_access.createDeviceTable()
db_access.insertDeviceData()
print("creating new instance")
client = mqtt.Client("server") #create new instance
client.on_message=on_message #attach function to callback
print("connecting to broker")
client.connect(config.MQTT_BROKER_ADDRESS)
client.loop_start()
print("Subscribing to topic","esp/#")
client.subscribe("esp/#")
I'd also rename that __init__.py file, to something like server.py as the init file isn't meant to be heavy.
Once you've done that, run it with the development server again and test that it works as expected.
Then install a WSGI server like gunicorn into your virtual enviornment:
pip install gunicorn
And launch the app with gunicorn (this command should work, assuming you renamed your file to server.py):
gunicorn --bind '0.0.0.0:5000' server:app
Then test again.

concurrent celery tasks execute and stores result but .get not working

I have written a Celery Task class like this:
myapp.tasks.py
from __future__ import absolute_import, unicode_literals
from .services.celery import app
from .services.command_service import CommandService
from exceptions.exceptions import *
from .models import Command
class CustomTask(app.Task):
def run(self, json_string, method_name, cmd_id: int):
command_obj = Command.objects.get(id=cmd_id) # type: Command
try:
val = eval('CommandService.{}(json_string={})'.format(method_name, json_string))
status, error = 200, None
except Exception as e:
auto_retry = command_obj.auto_retry
if auto_retry and isinstance(e, CustomError):
command_obj.retry_count += 1
command_obj.save()
return self.retry(countdown=CustomTask._backoff(command_obj.retry_count), exc=e)
elif auto_retry and isinstance(e, AnotherCustomError) and command_obj.retry_count == 0:
command_obj.retry_count += 1
command_obj.save()
print("RETRYING NOW FOR DEVICE CONNECTION ERROR. TRANSACTION: {} || IP: {}".format(command_obj.transaction_id,
command_obj.device_ip))
return self.retry(countdown=command_obj.retry_count*2, exc=e)
val = None
status, error = self._find_status_code(e)
return_dict = {"error": error, "status_code": status, "result": val}
return return_dict
#staticmethod
def _backoff(attempts):
return 2 ** attempts
#staticmethod
def _find_status_code(exception):
if isinstance(exception, APIException):
detail = exception.default_detail if exception.detail is None else exception.detail
return exception.status_code, detail
return 500, CustomTask._get_generic_exc_msg(exception)
#staticmethod
def _get_generic_exc_msg(exc: Exception):
s = ""
try:
for msg in exc.args:
s += msg + ". "
except Exception:
s = str(exc)
return s
CustomTask = app.register_task(CustomTask())
The Celery App definition:
from __future__ import absolute_import, unicode_literals
import os
from celery import Celery, Task
from django.conf import settings
# set the default Django settings module for the 'celery' program.
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'myapp.settings')
_celery_broker = settings.CELERY_BROKER <-- my broker is amqp://username:password#localhost:5672/myhost
app = Celery('myapp', broker=_celery_broker, backend='rpc://', include=['myapp.tasks', 'myapp.controllers'])
app.config_from_object('django.conf:settings', namespace='CELERY')
app.autodiscover_tasks(['myapp'])
app.conf.update(
result_expires=4800,
task_acks_late=True
)
my init.py the tutorial recommended:
from .celery import app as celery_app
__all__ = ['celery_app']
The controller that is running the task:
from __future__ import absolute_import, unicode_literals
from .services.log_service import LogRunner
from myapp.services.command_service import CommandService
from exceptions.exceptions import *
from myapp.services.celery import app
from myapp.services.tasks import MyTask
from .models import Command
class MyController:
def my_method(self, json_string):
<non-async set up stuff here>
cmd_obj = Command.objects.create(<stuff>) # type: Command
task_exec = MyTask.delay(json_string, MyController._method_name, cmd_obj.id)
cmd_obj.task_id = task_exec
try:
return_dict = task_exec.get()
except Exception as e:
self._logger.error("ERROR: IP: {} and transaction: {}. Error Type: {}, "
"Celery Error: {}".format(ip_addr, transaction_id, type(e), e))
status_code, error = self._find_status_code(e)
return_dict = {"error": error, "status_code": status_code, "result": None}
return return_dict
**So here is my issue: **
When I run this Django controller by hitting the view with one request, one after the other, it works perfectly fine.
However, the external service I am hitting will throw an error for 2 concurrent requests (and that is expected - that is ok). Upon getting the error, I retry my task automatically.
Here is the weird part
Upon retry, the .get() I have in my controller stops working for all concurrent requests. My controller just hangs there! And I know that celery is actually executing the task! Here is logs from the celery run:
[2018-09-25 19:10:24,932: INFO/MainProcess] Received task: myapp.tasks.MyTask[bafd62b6-7e29-4c39-86ff-fe903d864c4f]
[2018-09-25 19:10:25,710: INFO/MainProcess] Received task: myapp.tasks.MyTask[8d3b4279-0b7e-48cf-b45d-0f1f89e213d4] <-- THIS WILL FAIL BUT THAT IS OK
[2018-09-25 19:10:25,794: ERROR/ForkPoolWorker-1] Could not connect to device with IP <some ip> at all. Retry Later plase
[2018-09-25 19:10:25,798: WARNING/ForkPoolWorker-1] RETRYING NOW FOR DEVICE CONNECTION ERROR. TRANSACTION: b_txn || IP: <some ip>
[2018-09-25 19:10:25,821: INFO/MainProcess] Received task: myapp.tasks.MyTask[8d3b4279-0b7e-48cf-b45d-0f1f89e213d4] ETA:[2018-09-25 19:10:27.799473+00:00]
[2018-09-25 19:10:25,823: INFO/ForkPoolWorker-1] Task myapp.tasks.MyTask[8d3b4279-0b7e-48cf-b45d-0f1f89e213d4] retry: Retry in 2s: AnotherCustomError('Could not connect to IP <some ip> at all.',)
[2018-09-25 19:10:27,400: INFO/ForkPoolWorker-2] executed command some command at IP <some ip>
[2018-09-25 19:10:27,418: INFO/ForkPoolWorker-2] Task myapp.tasks.MyTask[bafd62b6-7e29-4c39-86ff-fe903d864c4f] succeeded in 2.4829552830196917s: {'error': None, 'status_code': 200, 'result': True}
<some command output here from a successful run> **<-- belongs to task bafd62b6-7e29-4c39-86ff-fe903d864c4f**
[2018-09-25 19:10:31,058: INFO/ForkPoolWorker-2] executed some command at IP <some ip>
[2018-09-25 19:10:31,059: INFO/ForkPoolWorker-2] Task command_runner.tasks.MyTask[8d3b4279-0b7e-48cf-b45d-0f1f89e213d4] succeeded in 2.404364461021032s: {'error': None, 'status_code': 200, 'result': True}
<some command output here from a successful run> **<-- belongs to task 8d3b4279-0b7e-48cf-b45d-0f1f89e213d4 which errored and retried itself**
So as you can see, the task does run on celery! It's just that the .get() I have in my controller is unable to pick these results back up - regardless of successful tasks or the erroneous tasks.
Often times, the error I get when running concurrent requests Error: "Received 0x50 while expecting 0xce". What is that? what does that mean? Again, weirdly enough, all this works when doing one request after another without Django handling multiple incoming requests. Although, I haven't been able to retry for single requests.
The RPC backend (which is what get is waiting for) is designed to fail if it is used more than once or after a celery restart.
a result can only be retrieved once, and only by the client that initiated the task. Two different processes can’t wait for the same result.
The messages are transient (non-persistent) by default, so the results will disappear if the broker restarts. You can configure the result backend to send persistent messages using the result_persistent setting.
So what looks like it is happening is that the exception causes celery to stop and break its rpc connection with the calling controller. Given your use case, it may make more sense to use a permanent results backend like redis or a database.

Python: Erratic joblib behaviour on Flask

I am trying to deploy a machine learning model on AWS EC2 instance using Flask. These are sklearn's fitted Random Forest models that are pickled using joblib. When I host Flask on localhost and load them into memory everything runs smoothly. However, when I deploy it on the apache2 server using mod_wsgi, joblib works sometimes(i.e. the models are loaded using joblib sometimes) and the other times the server just hangs. There is no error in logs. Any ideas would be appreciated.
Here is the relevant code that I am using:
# In[49]:
from flask import Flask, jsonify, request, render_template
from datetime import datetime
from sklearn.externals import joblib
import pickle as pkl
import os
# In[50]:
app = Flask(__name__, template_folder="/home/ubuntu/flaskapp/")
# In[51]:
log = lambda msg: app.logger.info(msg, extra={'worker_id': "request.uuid" })
# Logger
import logging
handler = logging.FileHandler('/home/ubuntu/app.log')
handler.setLevel(logging.ERROR)
app.logger.addHandler(handler)
# In[52]:
#app.route('/')
def host_template():
return render_template('Static_GUI.html')
# In[53]:
def load_models(path):
model_arr = [0]*len(os.listdir(path))
for filename in os.listdir(path):
f = open(path+"/"+filename, 'rb')
model_arr[int(filename[2:])] = joblib.load(f)
print("Classifier ", filename[2:], " added.")
f.close()
return model_arr
# In[54]:
partition_limit = 30
# In[55]:
print("Dictionaries being loaded.")
dict_file_path = "/home/ubuntu/Dictionaries/VARR"
dictionaries = pkl.load(open(dict_file_path, "rb"))
print("Dictionaries Loaded.")
# In[56]:
print("Begin loading classifiers.")
model_path = "/home/ubuntu/RF_Models/"
classifier_arr = load_models(model_path)
print("Classifiers Loaded.")
if __name__ == '__main__':
log("/home/ubuntu/print.log")
print("Starting API")
app.run(debug=True)
I was stuck with this for quite sometime. Posting the answer in case someone runs into this problem. Using print statements and looking at logs I narrowed the problem down to joblib.load statement. I found this awesome blog: http://blog.rtwilson.com/how-to-fix-flask-wsgi-webapp-hanging-when-importing-a-module-such-as-numpy-or-matplotlib
The idea of using a global process group fixed the problem. That forced the use of main interpreter just as the top comment on that blog page mentions.

django tornado getting 403 error on start

Following a tutorial I have written a management command to start tornado and it looks like:
import signal
import time
import tornado.httpserver
import tornado.ioloop
from django.core.management.base import BaseCommand, CommandError
from privatemessages.tornadoapp import application
class Command(BaseCommand):
args = '[port_number]'
help = 'Starts the Tornado application for message handling.'
def sig_handler(self, sig, frame):
"""Catch signal and init callback"""
tornado.ioloop.IOLoop.instance().add_callback(self.shutdown)
def shutdown(self):
"""Stop server and add callback to stop i/o loop"""
self.http_server.stop()
io_loop = tornado.ioloop.IOLoop.instance()
io_loop.add_timeout(time.time() + 2, io_loop.stop)
def handle(self, *args, **options):
if len(args) == 1:
try:
port = int(args[0])
except ValueError:
raise CommandError('Invalid port number specified')
else:
port = 8888
self.http_server = tornado.httpserver.HTTPServer(application)
self.http_server.listen(port, address="127.0.0.1")
# Init signals handler
signal.signal(signal.SIGTERM, self.sig_handler)
# This will also catch KeyboardInterrupt exception
signal.signal(signal.SIGINT, self.sig_handler)
print "start"
tornado.ioloop.IOLoop.instance().start()
print "end"
Here I when I run this management command I get tornado tornado.access:403 GET /2/ (127.0.0.1) 1.92ms error
For test purpose I have printed "start" and "end". I guess when this command is executed successfully "end" should be printed.
Here only "start" is printed not "end". I guess there is something error on tornado.ioloop.IOLoop.instance().start() but I dont know what it is.
Can anyone guide me what is wrong in here ?
You forgot to put self.http_server.start() before starting the ioloop.
...
self.http_server = tornado.httpserver.HTTPServer(application)
self.http_server.listen(port, address="127.0.0.1")
self.http_server.start()
...
Update
Along with the httpserver start missing, are you using this library?
Your management command is exactly this one
Their tutorial is in russian and honestly, I don't read russian.
Update2:
What is see in their code, is that the url /(?P<thread_id>\d+)/ is a websocket handler:
application = tornado.web.Application([
(r"/", MainHandler),
(r'/(?P<thread_id>\d+)/', MessagesHandler),
])
...
class MessagesHandler(tornado.websocket.WebSocketHandler):
...
But the error you posted seems like something tried to access it in http via GET.
Honestly, without a debugger and the same environment, I can't figure out the issue.