I'm writing a web crawler in aiohttp and experiencing a problem with cookies. Server I'm trying to crawl requires authentication and in order to fetch pages available to authenticated users I need to set a cookie with brackets in the key itself. This is a problem as aiohttp.ClientSession.cookie_jar.update_cookies either ignores any illegal cookies:
session = ClientSession()
cookie = SimpleCookie("a[b]=1234;")
session.cookie_jar.update_cookies(cookie)
print([f for f in session.cookie_jar]) # empty list, cookie not set
or raises a CookieError:
session = ClientSession()
cookie = SimpleCookie()
cookie["a[b]"] = "1234" # http.cookies.CookieError: Illegal key 'a[b]'
session.cookie_jar.update_cookies(cookie)
print([f for f in session.cookie_jar])
session = ClientSession()
session.cookie_jar.update_cookies([("a[b]", "1234")]) # http.cookies.CookieError: Illegal key 'a[b]'
print([f for f in session.cookie_jar])
It is possible to force setting the cookie by accessing http.cookies.Morsel's protected member _key, i.e.
session = ClientSession()
session.cookie_jar.update_cookies([("__tmp", "1234")])
for cookie in session.cookie_jar:
if cookie.key == "__tmp":
cookie._key = "a[b]"
print([f for f in session.cookie_jar]) # invalid cookie is set correctly
but this only pushes the problem one step back, as any session request e.g. session.get(url)starts raising http.cookies.CookieError.
I cannot get around sending this cookie. Am I stuck using non async libraries like requests or is there a way to ignore this issue?
I found a workaround, and while I dislike using it, it was preferred solution over rewriting entire aiohttp:
import sys
if "http" in sys.modules:
raise ImportError("Crawler must be imported before http module")
import http.cookies
http.cookies._is_legal_key = lambda _: True
aiohttp.CookieJar is modeled to follow corresponding RFC specifications. Why it should process illegal cookie names?
Related
my site is shofitv.com
I have my backend sending cookies over so users may access a protected cloudFront Distro.
The cookies are being generated fine.
They are being set but when I check my cookies via inspect element in my cookie tab I see none of my cookies present.
here is my code
def generate_signed_cookies(resource,expire_minutes, payload):
"""
#resource path to s3 object inside bucket(or a wildcard path,e.g. '/blah/*' or '*')
#expire_minutes how many minutes before we expire these access credentials (within cookie)
return tuple of domain used in resource URL & dict of name=>value cookies
"""
if not resource:
resource = '*'
dist_id = DOWNLOAD_DIST_ID
conn = CloudFrontConnection(AWS_ACCESS_KEY, AWS_SECRET_KEY)
dist = SignedCookiedCloudfrontDistribution(conn,dist_id)
cookies = dist.create_signed_cookies(resource,expire_minutes=expire_minutes)
taco = HttpResponse(json.dumps(payload), content_type="application/json")
taco.set_cookie('CloudFront-Policy', cookies[1]['CloudFront-Policy'], httponly=False, domain="shofitv.com")
taco.set_cookie('CloudFront-Signature', cookies[1]['CloudFront-Signature'],
httponly=False, domain="shofitv.com")
taco.set_cookie('CloudFront-Key-Pair-Id', cookies[1]['CloudFront-Key-Pair-Id'],
httponly=False, domain="shofitv.com")
print('here is the taco')
print(taco)
return taco
again you wont see cloudFront-Policy, CloudFront-Signature or CloudFront-Key-Pair-Id in my cookies. And the functionality that this is supposed to enable isn't working. These two show me the cookies aren't coming over. What is the situation?
As per my understanding I am doing everything correctly
I used requests to send request.
import requests
def print(req):
print('{}\n{}\n{}\n\n{}'.format(
'-----------START-----------',
req.method + ' ' + req.url,
'\n'.join('{}: {}'.format(k, v) for k, v in req.headers.items()),
req.body,
))
print "----------END------------"
try:
req = requests.Request('GET',
'https://myip/myproject/upload/token',
headers={'Authorization': 'Token 401f7ac837a',
})
prepared = req.prepare()
print(prepared)
except Exception as e:
print "Exception:", e
Output:
-----------START-----------
GET https://myip/myproject/upload/token
Authorization: Token 401f7ac837a
None
----------END------------
But after I printed the request.META, there is
META:{u'CSRF_COOKIE': u'YGzoMaNEQJz1Kg8yXAwjJt6yNuT9L'
What set the CSRF_COOKIE?
Any comments welcomed. Thanks
UPDATE
(1)
From the doc, it said This cookie is set by CsrfViewMiddleware, which means the CSRF cookie was set in back-end and set to front-end in the response (CSRF cookie: server -> browser). Why it also said For all incoming requests that are not using HTTP GET, HEAD, OPTIONS or TRACE, a CSRF cookie must be present? And why it appears in my request.META? (CSRF cookie: browser -> server ???)
(2)
It said **A hidden form field with the name ‘csrfmiddlewaretoken’ present in all outgoing POST forms. The value of this field is the value of the CSRF cookie.
This part is done by the template tag.
**
When and How the template tag do it?
This is a standard cookie Django applications spin up for each new user to prevent Cross Site Forgery.
A CSRF cookie that is set to a random value (a session independent
nonce, as it is called), which other sites will not have access to.
This cookie is set by CsrfViewMiddleware. It is meant to be permanent,
but since there is no way to set a cookie that never expires, it is
sent with every response that has called
django.middleware.csrf.get_token() (the function used internally to
retrieve the CSRF token).
For security reasons, the value of the CSRF cookie is changed each
time a user logs in.
for more reading
https://docs.djangoproject.com/en/1.9/ref/csrf/#how-it-works
Can I restrict actions of my API to specific users if I generate a token like this:
from itsdangerous import TimedJSONWebSignatureSerializer as Serializer
expiration = 600
s = Serializer(current_app.config['SECRET_KEY'], expires_in = expiration)
return s.dumps({ 'id': kwargs.get('user_id') })
And the verification
#staticmethod
def verify_auth_token(token):
s = Serializer(app.config['SECRET_KEY'])
try:
data = s.loads(token)
except SignatureExpired:
return None # valid token, but expired
except BadSignature:
return None # invalid token
user = User.query.get(data['id'])
return user
I don't understand how this works and achieves security. The way I'm used to securing an API for example, a user wants to do HTTP PUT to /posts/10 I'd usually get the post's author ie user_id then query the database get the token for that user_id, if the request token matches the queried token then it is safe for the PUT. I've read this article and don't fully understand how it achieves security without storing anything in a database. Could someone explain how it works?
By signing and sending the original token upon login the server basically gives the front end an all access ticket to the data the user would have access to, and the front end uses that token (golden ticket) on all future requests for as long as the token is not expired (tokens can be made to have expiration or not). The server in turn knows the token has not been tampered with, because the signature is basically the encrypted hash of the users recognizable data (user_id, username, etc). So, if you change the token information from something like:
{"user_id": 1}
to something like:
{"user_id": 2}
then the signature would be different and the server immediately knows this token is invalid.
This provides an authentication method that exempts the server from having to have a session, because it validates the token every time.
Here is an example of what a token could look like (itsdangerous can use this format of JSON web tokens)
I'm working on a Django web application which (amongst other things) needs to handle transaction status info sent using a POST request.
In addition to the HTTP security supported by the payment gateway, my view checks request.META['HTTP_REFERER'] against an entry in settings.py to try to prevent funny business:
if request.META.get('HTTP_REFERER', '') != settings.PAYMENT_URL and not settings.DEBUG:
return HttpResponseForbidden('Incorrect source URL for updating payment status')
Now I'd like to work out how to test this behaviour.
I can generate a failure easily enough; HTTP_REFERER is (predictably) None with a normal page load:
def test_transaction_status_succeeds(self):
response = self.client.post(reverse('transaction_status'), { ... })
self.assertEqual(response.status_code, 403)
How, though, can I fake a successful submission? I've tried setting HTTP_REFERER in extra, e.g. self.client.post(..., extra={'HTTP_REFERER': 'http://foo/bar'}), but this isn't working; the view is apparently still seeing a blank header.
Does the test client even support custom headers? Is there a work-around if not? I'm using Django 1.1, and would prefer not to upgrade just yet if at all possible.
Almost right. It's actually:
def transaction_status_suceeds(self):
response = self.client.post(reverse('transaction_status'), {}, HTTP_REFERER='http://foo/bar')
I'd missed a ** (scatter operator / keyword argument unpacking operator / whatever) when reading the source of test/client.py; extra ends up being a dictionary of extra keyword arguments to the function itself.
You can pass HTTP headers to the constructor of Client:
from django.test import Client
from django.urls import reverse
client = Client(
HTTP_USER_AGENT='Mozilla/5.0',
HTTP_REFERER='http://www.google.com',
)
response1 = client.get(reverse('foo'))
response2 = client.get(reverse('bar'))
This way you don't need to pass headers every time you make a request.
First off, if there is a true, official way of having flash/flex's NetConnections usurp the session/cookie state of the surrounding web page, so that if the user has already logged in, they don't need to provide credentials again just to set up an AMF connection, please stop me now and post the official answer.
Barring that, I'm assuming there is not, as I have searched and it seems to not exist. I've concocted a means of doing this, but want some feedback as to whether it is secure.
Accessing a wrapper-page for a flash object will always go to secure https due to django middleware
When the page view is loaded in Django, it creates a "session alias" object with a unique key that points to the current session in play (in which someone ostensibly logged in)
That session alias model is saved, and that key is placed into a cookie whose key is another random string, call it randomcookie
That randomcookie key name is passed as a context variable and written into the html as a flashvar to the swf
The swf is also loaded only via https
The flash application uses ExternalInterface to call java to grab the value at that randomcookie location, and also deletes the cookie
It then creates a NetConnection to a secure server https location, passing that randomcookie as an argument (data, not in the url) to a login-using-cookie rpc
At the gateway side, pyamf looks up the session alias and gets the session it points to, and logs in the user based on that (and deletes the alias, so it can't be reused)
(And the gateway request could also set the session cookie and session.session_key to the known session ID, but I could let it make a whole new session key... I'm assuming that doing so should affect the response properly so that it contains the correct session key)
At this point, the returned cookie values on the flash side should stick to the NetConnection so that further calls are authenticated (if a connection is authenticated using username and password the normal way, this definitely works, so I think this is a safe bet, testing will soon prove or disprove this)
So, is this unsafe, or will this work properly? As far as I know, since the html page is guaranteed to be over ssl, the key and cookie data should be encrypted and not steal-able. Then, the info therein should be safe to use one-time as basically a temporary password, sent again over ssl because the gateway is also https. After that, it's using the normal pyAMF system over https and not doing anything out of the ordinary.
No responses on this so far, so the best I can do is confirm that it does in fact physically work. For details on how to set up Flex Builder to write html-wrappers that communicate with Django pages templates, see my other post. The above was accomplished using a combination of the aforementioned, plus:
Made a SessionAlias model:
class SessionAlias(models.Model):
alias = models.CharField( max_length=40, primary_key=True )
session = models.ForeignKey( Session )
created = models.DateTimeField( auto_now_add=True )
Flex points to a Django page that loads via a view containing:
s = SessionAlias()
s.alias = SessionStore().session_key // generates new 40-char random
s.session = Session.objects.get( session_key=request.session.session_key )
s.save();
randomcookie = SessionStore().session_key // generates new 40-char random
kwargs['extra_context']['randomcookie'] = randomcookie
response = direct_to_template( request, **kwargs )
response.set_cookie( randomcookie, value=alias )
In the flex html-wrapper, where randomcookie is the location to look for the alias:
<param name="flashVars" value="randomcookie={{randomcookie}}" />
In applicationComplete, where we get randomcookie and find the alias, and log on using that:
var randomcookie:String = this.parameters["randomcookie"];
// randomcookie is something like "abc123"
var js:String = "function get_cookie(){return document.cookie;}";
var cookies:String = ExternalInterface.call(js).toString();
// cookies looks like "abc123=def456; sessionid=ghi789; ..."
var alias:String = // strip out the "def456"
mynetconnection.call( "loginByAlias", alias, successFunc, failureFunc );
Which in turn access this pyamf gateway rpc:
from django.contrib.auth import SESSION_KEY, load_backend
from django.contrib.auth.models import User
from django.contrib import auth
from django.conf import settings
def loginByAlias( request, alias ):
a = SessionAlias.objects.get( alias=alias )
session_engine = __import__( settings.SESSION_ENGINE, {}, {}, [''] )
session_wrapper = session_engine.SessionStore( a.session.session_key )
user_id = session_wrapper.get( SESSION_KEY )
user = User.objects.get( id=user_id )
user.backend='django.contrib.auth.backends.ModelBackend'
auth.login( request, user )
a.delete()
return whateverToFlash
And at that point, on the flash/flex side, that particular mynetconnection retains the session cookie state that can make future calls such that, inside the gateway, request.user is the properly-authenticated user that logged onto the webpage in the first place.
Note again that the run/debug settings for flex must use https, as well as the gateway settings for NetConnection. And when releasing this, I have to make sure that authenticated users stay on https.
Any further info from people would be appreciated, especially if there's real feedback on the security aspects of this...
IE doesn't give access to cookies in local development but if you publish the SWF and put on a domain, it should pickup the session just like ever other browser. Use Firefox 3.6 to build your flex apps locally.
Tested in IE8, Firefox using a pyamf gateway on Flex 3 with NetConnection. The gateway function was decorated with #login_required