How to serve same endpoint via HTTP and Websocket in Flask? - flask

I have an endpoint I'd like to make available on both HTTP (for the API) and on the websocket.
For instance, adding a new message could be done via a Socket "send" event, that will be handled on the server to process the request (checks the rights, create the necessary elements, etc).
These same actions could be possible by doing a POST request to /api/messages/ and would behave the same.
Since it's the same purpose and result, is there an efficient way to make the two works the same using Flask and Flask-SocketIO?
Thank you in advance.

The Socket.IO events don't have a request and a response like the HTTP methods have, so the inputs and outputs are different, making it impossible to use the same function for both.
But you could abstract the actual logic of your actions into auxiliary functions that you call from your HTTP route and your Socket.IO event handlers, that actually makes perfect sense if you want to offer your API over HTTP and Socket.IO both.

I'm not entirely sure if what I did was the same as what you're trying to do but here's what I did (trying to mock a connection to Graphite) which you could modify for your case:
from flask import Flask, jsonify
from flask_socketio import SocketIO, emit
app = Flask(__name__)
socketio = SocketIO(app)
# You probably need to define more functions here (for connect, disconnect, etc)
#socketio.on('my meaningful name', namespace='/endpoint')
def endpoint_socket():
emit([1, 2, 3])
#app.route("/endpoint/", methods=["GET", "POST"])
def endpoint_http():
return jsonify([1, 2, 3])
socketio.run(app, host="127.0.0.1", port=8000, debug=True)

Related

Flask-SocketIO access session from background task

I have a Flask app for http and web socket (Flask-SocketIO) communication between client and server using gevent. I also use server side session with Flask-Session extension for the app. I run a background task using SocketIO.start_background_task. And from this task, I need to access session information which will be used to emit message using socketio. I get error when accessing session from the task "RuntimeError: Working outside of request context." This typically means that you attempted to use functionality that needed an active HTTP request.
Socket IO instance is created as below-
socket_io = SocketIO(app, async_mode='gevent', manage_session=False)
Is there any issue with this usage. How this issue could be addressed?
Thanks
This is not related to Flask-SocketIO, but to Flask. The background task that you started does not have request and application contexts, so it does not have access to the session (it doesn't even know who the client is).
Flask provides the copy_current_request_context decorator duplicate the request/app contexts from the request handler into a background task.
The example from the Flask documentation uses gevent.spawn() to start the task, but this would be the same for a task started with start_background_task().
import gevent
from flask import copy_current_request_context
#app.route('/')
def index():
#copy_current_request_context
def do_some_work():
# do some work here, it can access flask.request or
# flask.session like you would otherwise in the view function.
...
gevent.spawn(do_some_work)
return 'Regular response'

File upload to third party API with HTTP 307 Temporary Redirect in Flask

I have a scenario where I have to upload a file from flask app to a third party API. I have wrapped all API requests in Flask to control API usage. For this I have redirected the traffic from main route towards the api wrapper route with http 307 status to preserve the request body and in the API wrapper I have used request to post into third party API endpoint.
The problem is only file < 100KB gets send through the redirection request, having a file larger than 100 KB gets somehow terminated in the sending phase.
Is there any limit in the 307 redirection and payload size?
I tried debugging by watching the network timing stack trace, from there it seems the request is dropped in the sending phase.
Main Blueprint
#main.route('/upload/',methods=['POST','GET'])
def upload():
#for ajax call
if request.method == 'POST'
return redirect(url_for('api.file_push'),code=307)
else:
return render_template('file-upload.html')
API Blueprint
#api.route('/upload/',methods=['POST'])
def file_push():
upload_file = request.files['file']
filename = urllib.parse.quote(upload_file.filename)
toUpload = upload_file.read()
result=requests.post(apiInterfaces.FILE_UPLOAD_INTERFACE+'/'+filename,files{'file':toUpload})
return result
Yes, I can directly send post request to API endpoint from main route but I don't want to, it will destroy my system design and architecture.
I assume you're using Python, and possibly requests so this answer will be based on what I've learned figuring this out (debugging with a colleague). I filed a bug report with psf/requests. There is a related answer here which confirms my suspicions.
It seems that when you initiate a PUT request using requests (and urllib3), the entire request is sent before a response from the server is looked at, but some servers can send a HTTP 307 during this time. One of two things happen:
the server closes the connection by sending the response, even if the client has not finished sending the entire file. In this case, the client might see a closed connection and you won't have a response you can use for redirect (this happens with urllib3>1.26.10 (roughly)) but requests is not handling this situation correctly
the server sends the response and you re-upload the file to the second location (urllib3==1.26.3 has this behavior when using requests). Technically, there is a bug in urllib3 and it should have failed, but silently lets you upload...
However, it seems that if you are expecting a redirect, the most obvious solution might be to send a null byte via PUT first, get a valid response back for the new URL [don't follow redirects], and then use that response to do the PUT of the full file. With requests, it's probably something like
>>> import requests
>>> from io import BytesIO
>>> data = BytesIO(b'\x00')
>>> response = request.put(url, data=data, allow_redirects=False)
>>> request.put(response.headers['Location'], data=fp, allow_redirects=False)
and at that point, you'll be ok (assuming you only expect a single redirect here).

Get HTTP Request like transfered over the wire (Django)

Is it possible to get the http request as bytestring like it gets transferred over the wire if you have a django request object?
Of course the plain text (not encrypted if https gets used).
I would like to store the bytestring to analyze it later.
At best I would like to access the real bytestring. Creating a bytestring from request.META, request.GET and friends will likely not be the same like the original.
Update: it seems that it is impossible to get to the original bytes. Then the question is: how to construct a bytestring which roughly looks like the original?
As others pointed out it is not possible because Django doesn't interact with raw requests.
You could just try reconstructing the request like this.
def reconstruct_request(request):
headers = ''
for header, value in request.META.items():
if not header.startswith('HTTP'):
continue
header = '-'.join([h.capitalize() for h in header[5:].lower().split('_')])
headers += '{}: {}\n'.format(header, value)
return (
'{method} HTTP/1.1\n'
'Content-Length: {content_length}\n'
'Content-Type: {content_type}\n'
'{headers}\n\n'
'{body}'
).format(
method=request.method,
content_length=request.META['CONTENT_LENGTH'],
content_type=request.META['CONTENT_TYPE'],
headers=headers,
body=request.body,
)
NOTE this is not a complete example only proof of concept
The basic answer is no, Django doesn't have access to the raw request, in fact it doesn't even have code to parse raw HTTP request.
This is because Django's (like many other Python web frameworks) HTTP request/response handling is, in it's core, a WSGI application (WSGI specification).
It's the job of the frontend/proxy server (like Apache or nginx) and application server (like uWSGI or gunicorn) to "massage" the request (like transforming and stripping headers) and convert it into an object that can be handled by Django.
As an experiment you can actually wrap Django's WSGI application yourself and see what Django gets to work with when a request comes in.
Edit your project's wsgi.py and add some extremely basic WSGI "middleware":
import os
from django.core.wsgi import get_wsgi_application
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'project.settings')
class MyMiddleware:
def __init__(self, app):
self._app = app
def __call__(self, environ, start_response):
import pdb; pdb.set_trace()
return self._app(environ, start_response)
# Wrap Django's WSGI application
application = MyMiddleware(get_wsgi_application())
Now if you start your devserver (./manage.py runserver) and send a request to your Django application. You'll drop into a debugger.
The only thing of interest here is the environ dict. Poke around it and you'll see that it's pretty much the same as what you'll find in Django's request.META. (The contents of the environ dict is detailed in this section of the WSGI spec.)
Knowing this, the best thing you can get is piecing together items form the environ dict to something that remotely resembles an HTTP request.
But why? If you have an environ dict, you have all the information you need to replicate a Django request. There's no actual need to translate this back to a HTTP request.
In fact, as you now known, you don't need a HTTP request at all to call Django's WSGI application. All you need is a environ dict with the required keys and a callable so that Django can relay the response.
So, to analyze requests (and even be able to replay them) you only need to be able to recreate a valid environ dict.
To do so in Django the easiest option would be to serialize request.META and request.body to a JSON dict.
If you really need something that resembles an HTTP request (and you are unable to go a level up to e.g. the webserver to log this information) you'll just have to piece this together from the information available in request.META and request.body, with the caveats that this is not a realistic representation of the original HTTP request.

Scrapy. How to return request results to calling method? Can I use python requests library inside scrapy?

I have Scrapy spider that runs well.
What I need to do is make an API call from inside parse method and use results from response in the same method with the same items. How do I do this? The only simple thing comes to mind is use python requests library but I am not sure if this works in scrapy and moreover at scrapinghub. Is there any built in solution?
Here is an example.
def agency(self, response):
# inspect_response(response, self)
agents = response.xpath('//a[contains(#class,"agency-carousel__item")]')
Agencie_Name = response.xpath('//h1[#class = "agency-header__name"]/text()').extract_first()
Business_Adress = response.xpath('//div[#class = "agency-header__address"]//text()').extract()
Phone = response.xpath('//span[#class = "modal-item__text"]/text()').extract_first()
Website = response.xpath('//span[#class = "modal-item__text"][contains(text(),"Website")]/../#href').extract_first()
if Website:
pass
# 1 send request to hunter io and get pattern. Apply to entire team. Pass as meta
# do smth with this pattern in here using info from this page.
So here i normaly extract all info from scrapy response, and if Website variable is populated I need to send api call to hunter io to get email pattern for this domain and use it to generate emails in the same method.
Hopes that makes sence.
As for vanilla scrapy on your own PC / server, there is no problem accessing third party libraries inside a scraper. You can just do whatever you want, so something like this is no problem at all (which would fetch a mail address from an API using requests and then send out a mail using smtplib).
import requests
import smtplib
from email.mime.text import MIMEText
[...]
if Website:
r = requests.get('https://example.com/mail_for_site?url=%s' % Website, auth=('user', 'pass'))
mail = r.json()['Mail']
msg = MIMEText('This will be the perfect job offer for you. ......')
msg['Subject'] = 'Perfect job for you!'
msg['From'] = 'sender#example.com'
msg['To'] = mail
s = smtplib.SMTP('example.com')
s.sendmail('sender#example.com', [mail], msg.as_string())
However, as for scrapinghub I do not know. For this, I can just give you a developer's point of view, because I also develop a managed scraping platform.
I assume that sending a HTTP(S) request using requests would not be any problem at all. They do not gain security by blocking it, because HTTP(S) traffic is allowed for scrapy anyway. So if somebody would want to do harmful attacks with requests through HTTP(S), they could just call the same requests with scrapy.
However, SMTP might be another point, you'd have to try. It's possible that they do not allow SMTP traffic from their servers, because it is not required for scraping tasks and can be abused for sending spam. However, since there are legitimate uses for sending mails during a scraping process (e.g. errors), it might as well be possible that SMTP is perfectly fine on scrapinghub, too (and they employ rate limiting or something else against spam).

given a list of urls how can i return the content of that urls from a generator asyncronously using twisted python

I want to know how to return the content of a list of urls asynchronously using twisted in python. I know i can use getPage() to get the url content asynchronously but how can i use the result with yield of the generator to return the result from the generator function.
synchronous code looks like this
import requests
def gen(urls):
for url in urls:
yield requests.get(url)
Edit 1:
my specific requirement is to provide a service through flask python. which is, given a keyword my flask application should return the content of all urls related to that keyword. i can get list of urls from a keyword using a search engine api, all i have to do is return the content using server sent events (event source) as streaming service.
def handle_request():
urls=search_engine.search(requests.args.get('query'))
def content_gen():
for url in urls:
yield requests.get(url)
return Response(content_gen(), mimetype="text/event-stream")
the requests.get call is synchronous, all i want is to make the code asynchronous by using getPage() of twisted for my flask application
EDIT 2:
by using Twisted's getPage on all urls i am going to get a list of deferreds. Flask is a synchronous framework, so i cant directly use deferreds to return data through Flask. By using crochet library i can wait on deferreds synchronously since waiting on deferred is blocking using #wait.for decorator in crochet, the results are returned sequentially. But i want the generator function to yield the data of url whichever is available rather than following the url sequence.
Honestly I don't know much about crochet or twisted, so if i am asking a trivial question please excuse me.