Flask: Dumping "request.data" in middleware? [duplicate] - flask

I can't seem to figure out how to access POST data using WSGI. I tried the example on the wsgi.org website and it didn't work. I'm using Python 3.0 right now. Please don't recommend a WSGI framework as that is not what I'm looking for.
I would like to figure out how to get it into a fieldstorage object.

Assuming you are trying to get just the POST data into a FieldStorage object:
# env is the environment handed to you by the WSGI server.
# I am removing the query string from the env before passing it to the
# FieldStorage so we only have POST data in there.
post_env = env.copy()
post_env['QUERY_STRING'] = ''
post = cgi.FieldStorage(
fp=env['wsgi.input'],
environ=post_env,
keep_blank_values=True
)

body= '' # b'' for consistency on Python 3.0
try:
length= int(environ.get('CONTENT_LENGTH', '0'))
except ValueError:
length= 0
if length!=0:
body= environ['wsgi.input'].read(length)
Note that WSGI is not yet fully-specified for Python 3.0, and much of the popular WSGI infrastructure has not been converted (or has been 2to3d, but not properly tested). (Even wsgiref.simple_server won't run.) You're in for a rough time doing WSGI on 3.0 today.

This worked for me (in Python 3.0):
import urllib.parse
post_input = urllib.parse.parse_qs(environ['wsgi.input'].readline().decode(),True)

I had the same issue and I invested some time researching a solution.
the complete answer with details and ressources (since the one accepted here didnt work for me on python3, many errors to correct in env library etc):
# the code below is taken from and explained officially here:
# https://wsgi.readthedocs.io/en/latest/specifications/handling_post_forms.html
import cgi
def is_post_request(environ):
if environ['REQUEST_METHOD'].upper() != 'POST':
return False
content_type = environ.get('CONTENT_TYPE', 'application/x-www-form-urlencoded')
return (content_type.startswith('application/x-www-form-urlencoded' or content_type.startswith('multipart/form-data')))
def get_post_form(environ):
assert is_post_request(environ)
input = environ['wsgi.input']
post_form = environ.get('wsgi.post_form')
if (post_form is not None
and post_form[0] is input):
return post_form[2]
# This must be done to avoid a bug in cgi.FieldStorage
environ.setdefault('QUERY_STRING', '')
fs = cgi.FieldStorage(fp=input,
environ=environ,
keep_blank_values=1)
new_input = InputProcessed()
post_form = (new_input, input, fs)
environ['wsgi.post_form'] = post_form
environ['wsgi.input'] = new_input
return fs
class InputProcessed(object):
def read(self, *args):
raise EOFError('The wsgi.input stream has already been consumed')
readline = readlines = __iter__ = read
# the basic and expected application function for wsgi
# get_post_form(environ) returns a FieldStorage object
# to access the values use the method .getvalue('the_key_name')
# this is explained officially here:
# https://docs.python.org/3/library/cgi.html
# if you don't know what are the keys, use .keys() method and loop through them
def application(environ, start_response):
start_response('200 OK', [('Content-type', 'text/plain')])
user = get_post_form(environ).getvalue('user')
password = get_post_form(environ).getvalue('password')
output = 'user is: '+user+' and password is: '+password
return [output.encode()]

I would suggest you look at how some frameworks do it for an example. (I am not recommending any single one, just using them as an example.)
Here is the code from Werkzeug:
http://dev.pocoo.org/projects/werkzeug/browser/werkzeug/wrappers.py#L150
which calls
http://dev.pocoo.org/projects/werkzeug/browser/werkzeug/utils.py#L1420
It's a bit complicated to summarize here, so I won't.

Related

Django body encoding vs slack-api secret

I am following the instruction from this page. I am building a slack slash command handling server and I can't rebuild the signature to validate slash request authenticity.
here is the code snippet from my django application (the view uses the django rest-framework APIView):
#property
def x_slack_req_ts(self):
if self.xsrts is not None:
return self.xsrts
self.xsrts = str(self.request.META['HTTP_X_SLACK_REQUEST_TIMESTAMP'])
return self.xsrts
#property
def x_slack_signature(self):
if self.xss is not None:
return self.xss
self.xss = self.request.META['HTTP_X_SLACK_SIGNATURE']
return self.xss
#property
def base_message(self):
if self.bs is not None:
return self.bs
self.bs = ':'.join(["v0", self.x_slack_req_ts, self.raw.decode('utf-8')])
return self.bs
#property
def encoded_secret(self):
return self.app.signing_secret.encode('utf-8')
#property
def signed(self):
if self.non_base is not None:
return self.non_base
hashed = hmac.new(self.encoded_secret, self.base_message.encode('utf-8'), hashlib.sha256)
self.non_base = "v0=" + hashed.hexdigest()
return self.non_base
This is within a class where self.raw = request.body the django request and self.app.signing_secret is a string with the appropriate slack secret string. It doesn't work as the self.non_base yield an innaccurate value.
Now if I open an interactive python repl and do the following:
>>> import hmac
>>> import hashlib
>>> secret = "8f742231b10e8888abcd99yyyzzz85a5"
>>> ts = "1531420618"
>>> msg = "token=xyzz0WbapA4vBCDEFasx0q6G&team_id=T1DC2JH3J&team_domain=testteamnow&channel_id=G8PSS9T3V&channel_name=foobar&user_id=U2CERLKJA&user_name=roadrunner&command=%2Fwebhook-collect&text=&response_url=https%3A%2F%2Fhooks.slack.com%2Fcommands%2FT1DC2JH3J%2F397700885554%2F96rGlfmibIGlgcZRskXaIFfN&trigger_id=398738663015.47445629121.803a0bc887a14d10d2c447fce8b6703c"
>>> ref_signature = "v0=a2114d57b48eac39b9ad189dd8316235a7b4a8d21a10bd27519666489c69b503"
>>> base = ":".join(["v0", ts, msg])
>>> hashed = hmac.new(secret.encode(), base.encode(), hashlib.sha256)
>>> hashed.hexdigest()
>>> 'a2114d57b48eac39b9ad189dd8316235a7b4a8d21a10bd27519666489c69b503'
You will recognise the referenced link example. If I use the values from my django app with one of MY examples, it works within the repl but doesn't within the django app.
MY QUESTION: I believe this is caused by the self.raw.decode() encoding not being consistent with the printout I extracted to copy/paste in the repl. Has anyone encountered that issue and what is the fix? I tried a few random things with the urllib.parse library... How can I make sure that the request.body encoding is consistent with the example from flask with get_data() (as suggested by the doc in the link)?
UPDATE: I defined a custom parser:
class SlashParser(BaseParser):
"""
Parser for form data.
"""
media_type = 'application/x-www-form-urlencoded'
def parse(self, stream, media_type=None, parser_context=None):
"""
Parses the incoming bytestream as a URL encoded form,
and returns the resulting QueryDict.
"""
parser_context = parser_context or {}
request = parser_context.get('request')
raw_data = stream.read()
data = QueryDict(raw_data, encoding='utf-8')
setattr(data, 'raw_body', raw_data) # setting a 'body' alike custom attr with raw POST content
return data
To test based on this question and the raw_body in the custom parser generates the exact same hashed signature as the normal "body" but again, copy pasting in the repl to test outside the DRF works. Pretty sure it's an encoding problem but completely at loss...
I found the problem which is very frustrating.
It turns out that the signing secret was stored in too short a str array and were missing trailing characters which obviously, resulted in bad hashing of the message.

Python scrapy working (only half of the time)

I created a python scrapy project to extract the prices of some google flights.
I configured the middleware to use PhantomJS instead of a normal browser.
class JSMiddleware(object):
def process_request(self, request, spider):
driver = webdriver.PhantomJS()
try:
driver.get(request.url)
time.sleep(1.5)
except e:
raise ValueError("request url failed - \n url: {},\n error:
{}").format(request.url, e)
body = driver.page_source
#encoding='utf-8' - add to html response if necessary
return HtmlResponse(driver.current_url, body=body,encoding='utf-8',
request=request)
In the settings.py i added:
DOWNLOADER_MIDDLEWARES = {
# key path intermediate class, order value of middleware
'scraper_module.middlewares.middleware.JSMiddleware' : 543 ,
# prohibit the built-in middleware
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware' : None , } `
I also created the following spider class:
import scrapy
from scrapy import Selector
class Gspider(scrapy.Spider):
name = "google_spider"
def __init__(self):
self.start_urls = ["https://www.google.pt/flights/#search;f=LIS;t=POR;d=2017-06-18;r=2017-06-22"]
self.prices = []
self.links = []
def clean_price(self, part):
#part received as a list
#the encoding is utf-8
part = part[0]
part = part.encode('utf-8')
part = filter(str.isdigit, part)
return part
def clean_link(self, part):
part = part[0]
part = part.encode('utf-8')
return part
def get_part(self, var_holder, response, marker, inner_marker, amount = 1):
selector = Selector(response)
divs = selector.css(marker)
for n, div in enumerate(divs):
if n < amount:
part = div.css(inner_marker).extract()
if inner_marker == '::text':
part = self.clean_price(part)
else:
part = self.clean_link(part)
var_holder.append(part)
else:
break
return var_holder
def parse(self, response):
prices, links = [], []
prices = self.get_part(prices, response, 'div.OMOBOQD-d-Ab', '::text')
print prices
links = self.get_part(links, response, 'a.OMOBOQD-d-X', 'a::attr(href)')
print links
The problem is, I run the code in the shell, and around half of the times I successfully get the prices and links requested, but another half of the time, the final vectors which should contain the extracted data, are empty.
I do not get any errors during execution.
Does anyone have any idea about why this is happening?
here are the logs from the command line:
Google has a very strict policy in terms of crawling. (Pretty hypocritical when you know that they constently crawl all the web...)
You should either find an API, as said previously in the comments or maybe use proxies. An easy way is to use Crawlera. It manages thousands of proxies so you don't have to bother. I personnaly use it to crawl google and it works perfectly. The downside is that it is not free.

passing commandline arguments to a selenium python webdriver test case

The following code is written using selenium python web driver which is run in saucelabs.I am providing the browser name,version and platform in a list,how do i do the same by providing the browser details through command line arguments? I am using py.test to execute the test cases.
import os
import sys
import httplib
import base64
import json
import new
import unittest
import sauceclient
from selenium import webdriver
from sauceclient import SauceClient
# it's best to remove the hardcoded defaults and always get these values
# from environment variables
USERNAME = os.environ.get('SAUCE_USERNAME', "ranjanprabhub")
ACCESS_KEY = os.environ.get('SAUCE_ACCESS_KEY', "ecec4dd0-d8da-49b9-b719-17e2c43d0165")
sauce = SauceClient(USERNAME, ACCESS_KEY)
browsers = [{"platform": "Mac OS X 10.9",
"browserName": "chrome",
"version": ""},
]
def on_platforms(platforms):
def decorator(base_class):
module = sys.modules[base_class.__module__].__dict__
for i, platform in enumerate(platforms):
d = dict(base_class.__dict__)
d['desired_capabilities'] = platform
name = "%s_%s" % (base_class.__name__, i + 1)
module[name] = new.classobj(name, (base_class,), d)
return decorator
#on_platforms(browsers)
class SauceSampleTest(unittest.TestCase):
def setUp(self):
self.desired_capabilities['name'] = self.id()
sauce_url = "http://%s:%s#ondemand.saucelabs.com:80/wd/hub"
self.driver = webdriver.Remote(
desired_capabilities=self.desired_capabilities,
command_executor=sauce_url % (USERNAME, ACCESS_KEY)
)
self.driver.implicitly_wait(30)
def test_sauce(self):
self.driver.get('http://saucelabs.com/test/guinea-pig')
assert "I am a page title - Sauce Labs" in self.driver.title
comments = self.driver.find_element_by_id('comments')
comments.send_keys('Hello! I am some example comments.'
' I should be in the page after submitting the form')
self.driver.find_element_by_id('submit').click()
commented = self.driver.find_element_by_id('your_comments')
assert ('Your comments: Hello! I am some example comments.'
' I should be in the page after submitting the form'
in commented.text)
body = self.driver.find_element_by_xpath('//body')
assert 'I am some other page content' not in body.text
self.driver.find_elements_by_link_text('i am a link')[0].click()
body = self.driver.find_element_by_xpath('//body')
assert 'I am some other page content' in body.text
def tearDown(self):
print("Link to your job: https://saucelabs.com/jobs/%s" % self.driver.session_id)
try:
if sys.exc_info() == (None, None, None):
sauce.jobs.update_job(self.driver.session_id, passed=True)
else:
sauce.jobs.update_job(self.driver.session_id, passed=False)
finally:
self.driver.quit()
So this is a bit complicated because you can pass an array of browsers into the #on_platforms decorator. My solution will only work for a single browser, as it looks like that's what you're doing right now.
For the current, single browser, situation -- you're looking for argparse. Here's my suggested fix:
import argparse
def setup_parser():
parser = argparse.ArgumentParser(description='Automation Testing!')
parser.add_argument('-p', '--platform', help='Platform for desired_caps', default='Mac OS X 10.9')
parser.add_argument('-b', '--browser-name', help='Browser Name for desired_caps', default='chrome')
parser.add_argument('-v', '--version', default='')
args = vars(parser.parse_args())
return args
desired_caps = setup_parser()
browsers = [desired_caps]
print browsers
But if you're looking to test multiple browsers (which I suggest you do!), you should not try and use command line arguments for the desired_caps of each individual browser. You should instead load a json config file for the browsers and the desired_caps for each one that you want Sauce to run.
Maybe have a different config file for each set of browsers, and then use command line arguments to pass in the config files you want to load.

How can I write tests that populates raw_post_data and request.FILES['myfile']

I have something like this:
def upload_something(request):
data = {}
if request.FILES:
raw_file = request.FILES['myfile'].read()
else:
raw_file = request.raw_post_data
I can't seem to be able to write a unit-test that populates raw_post_data, how would I go about doing that? I basically just want to send an image file. I'm trying to create a test case for when I read raw_post_data and it errors with:
You cannot access raw_post_data after reading from request's data stream
I'm assuming you have figured this out by now, but as the answers are almost out of date with the deprecation of raw_post_data I thought i'd post.
def test_xml_payload(self):
data = '<?xml version="1.0" encoding="UTF-8"?><blah></blah>'
response = self.client.post(reverse('my_url'),
data=data,
content_type='application/xml')
def my_view(request):
xml = request.body
You can use mocking. Some examples available here and in docs here
Updated
Kit, I think it's very depends on your test case. But in general you shouldn't use raw_post_data directly. Instead it's have to be patched like in example below:
from mock import Mock, MagicMock
class SomeTestCase(TestCase):
def testRawPostData(self):
...
request = Mock(spec=request)
request.raw_post_data = 'myrawdata'
print request.raw_post_data # prints 'myrawdata'
file_mock = MagicMock(spec=file)
file_mock.read.return_value = 'myfiledata'
request.FILES = {'myfile': file_mock}
print request.FILES['myfile'].read() # prints 'myfiledata'
The error message the interpreter is giving is correct. After you access the POST data via if request.FILES, you can no longer access the raw_post_data. If in your actual code (not the tests) you hit that line, it would error with the same message. Basically, you need two separate views for form-based POSTS and direct file POSTS.
I took this listing here
c = Client()
f = open('wishlist.doc')
c.post('/customers/wishes/', {'name': 'fred', 'attachment': f})
f.close()
Client is a special class for testing your views. This is the example of posting files to your view. It's part of Django testing framework.

Syncing data between devel/live databases in Django

With Django's new multi-db functionality in the development version, I've been trying to work on creating a management command that let's me synchronize the data from the live site down to a developer machine for extended testing. (Having actual data, particularly user-entered data, allows me to test a broader range of inputs.)
Right now I've got a "mostly" working command. It can sync "simple" model data but the problem I'm having is that it ignores ManyToMany fields which I don't see any reason for it do so. Anyone have any ideas of either how to fix that or a better want to handle this? Should I be exporting that first query to a fixture first and then re-importing it?
from django.core.management.base import LabelCommand
from django.db.utils import IntegrityError
from django.db import models
from django.conf import settings
LIVE_DATABASE_KEY = 'live'
class Command(LabelCommand):
help = ("Synchronizes the data between the local machine and the live server")
args = "APP_NAME"
label = 'application name'
requires_model_validation = False
can_import_settings = True
def handle_label(self, label, **options):
# Make sure we're running the command on a developer machine and that we've got the right settings
db_settings = getattr(settings, 'DATABASES', {})
if not LIVE_DATABASE_KEY in db_settings:
print 'Could not find "%s" in database settings.' % LIVE_DATABASE_KEY
return
if db_settings.get('default') == db_settings.get(LIVE_DATABASE_KEY):
print 'Data cannot synchronize with self. This command must be run on a non-production server.'
return
# Fetch all models for the given app
try:
app = models.get_app(label)
app_models = models.get_models(app)
except:
print "The app '%s' could not be found or models could not be loaded for it." % label
for model in app_models:
print 'Syncing %s.%s ...' % (model._meta.app_label, model._meta.object_name)
# Query each model from the live site
qs = model.objects.all().using(LIVE_DATABASE_KEY)
# ...and save it to the local database
for record in qs:
try:
record.save(using='default')
except IntegrityError:
# Skip as the record probably already exists
pass
Django command extension's Dumpscript should help a lot.
This doesn't answer your question exactly but why not just do a db dump and a db restore?