Forbidden (CSRF cookie not set.) when csrf is in header [duplicate] - django

This question already has an answer here:
Django bug on CRSF token
(1 answer)
Closed 5 years ago.
The request header is as below.
Accept:application/json, text/plain, */*
Accept-Encoding:gzip, deflate, br
Accept-Language:en-US,en;q=0.8
Connection:keep-alive
Content-Length:129
Content-Type:text/plain
Host:localhost:9000
Origin:http://localhost:8000
Referer:http://localhost:8000/
User-Agent:Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36
X-CSRFTOKEN:t5Nx0SW9haZTeOcErcBDtaq6psqBfeyuX4LRQ1WOOXq5g93tQkvcUZDGoWz8wSeD
The X-CSRFTOKEN is there but Django still complain about CSRF cookie not set. What happen to Django?
In settings.py, the naming are perfectly correct.
CSRF_HEADER_NAME = "HTTP_X_CSRFTOKEN"

Check if CSRF_COOKIE_SECURE is set to true.
You would get such an error message if CSRF_COOKIE_SECURE is true and you access a site through http instead of https.
Or you need to use (for testing only) csrf_exempt.
For example, curtisp mentions in the comments:
I had conditional dev vs prod settings and accidentally put dev settings to CSRF_COOKIE_SECURE = True and SESSION_COOKIE_SECURE = True.
My dev site is localhost on laptop, and is does not have SSL.
So changing dev settings to False fixed it for me.

Related

How to enable CORS in python

Let me start this with, I do not know python, I've had maybe 1 day going through the python tutorials. The situation is this. I have an angular app that has a python app hosted with Apache on a vm in an iframe. I didn't write the python app, but another developer wrote me an endpoint where I am supposed to be able to post from my angular app.
The developer who made the python endpoint is saying that there is something wrong with my request but I am fairly certain there isn't anything wrong. I am almost 100% certain that the problem is that there are no CORS headers in the response and/or the response is not set up to respond to the OPTIONS method. Below is the entirety of the python endpoint:
import os, site, inspect
site.addsitedir(os.path.dirname(os.path.abspath(inspect.getfile(inspect.currentframe())))+"/../")
import json
from datetime import datetime
import pymongo
from Config import Config
def application(environ, start_response):
response = environ['wsgi.input'].read(int(environ['CONTENT_LENGTH']))
if response:
json_response = json.loads(response)
document = {
'payment_id': json_response['payment_id'],
'log': json_response['log'],
'login_id': json_response['login_id'],
'browser': environ.get('HTTP_USER_AGENT', None),
'ip_address': environ.get('REMOTE_ADDR', None),
'created_at': datetime.utcnow(),
}
client = pymongo.MongoClient(Config.getValue('MongoServer'))
db = client.updatepromise
db.PaymentLogs.insert(document)
start_response('200 OK', [('Content-Type', 'application/json')
return '{"success": true}'
start_response('400 Bad Request', [('Content-Type', 'application/json')])
return '{"success": false}'
I have attempted the following to make this work: I added to both start_response functions more headers so the code looks like this now:
start_response('201 OK', [('Content-Type', 'application/json',
('Access-Control-Allow-Headers','authorization'),
('Access-Control-Allow-Methods','HEAD, GET, POST, PUT, PATCH, DELETE'),
('Access-Control-Allow-Origin','*'),
('Access-Control-Max-Age','600'))])
Not: I did this both with the 200 and the 400 response at first, and saw no change at all in the response, then just for the heck of it, I decided to change the 200 to a 201, this also did not come through on the response so I suspect this code isn't even getting run for some reason.
Please help, python newb here.
Addendum, i figured this would help, here is what the Headers look like in the response:
General:
Request URL: http://rpc.local/api/payment_log_api.py
Request Method: OPTIONS
Status Code: 200 OK
Remote Address: 10.1.20.233:80
Referrer Policy: no-referrer-when-downgrade
Response Headers:
Allow: GET,HEAD,POST,OPTIONS
Connection: Keep-Alive
Content-Length: 0
Content-Type: text/x-python
Date: Fri, 27 Apr 2018 15:18:55 GMT
Keep-Alive: timeout=5, max=100
Server: Apache/2.4.18 (Ubuntu)
Request Headers:
Accept: */*
Accept-Encoding: gzip, deflate
Accept-Language: en-US,en;q=0.9
Access-Control-Request-Headers: authorization,content-type
Access-Control-Request-Method: POST
Connection: keep-alive
Host: rpc.local
Origin: http://10.1.20.61:4200
User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36
Here it is. Just add this to the application right at the beginning:
def application(environ, start_response):
if environ['REQUEST_METHOD'] == 'OPTIONS':
start_response(
'200 OK',
[
('Content-Type', 'application/json'),
('Access-Control-Allow-Origin', '*'),
('Access-Control-Allow-Headers', 'Authorization, Content-Type'),
('Access-Control-Allow-Methods', 'POST'),
]
)
return ''
For Python with CGI, I found this to work:
print '''Access-Control-Allow-Origin: *\r\n''',
print '''Content-Type: text/html\r\n'''
Don't forget to enable CORS on the other side as well, e.g., JavaScript jQuery:
$.ajax({ url: URL,
type: "GET",
crossDomain: true,
dataType: "text", etc, etc

Internal Server Error when I try to use HTTPS protocol for traefik backend

My setup is ELB --https--> traefik --https--> service
I get back a 500 Internal Server Error from traefik on every request. It doesn't appear the request ever makes it to the service. The service is running Apache with access logging and I see no incoming requests logged. I am able to curl the service directly and receive an expected response. Both traefik and the service are running in Docker containers. I am also able to use port 80 all the way through with success, and I can use https to traefik and port 80 to the service. I get an error from apache, but it does go all the way through.
traefik.toml
logLevel = "DEBUG"
RootCAs = [ "/etc/certs/ca.pem" ]
#InsecureSkipVerify = true
defaultEntryPoints = ["https"]
[entryPoints]
[entryPoints.https]
address = ":443"
[entryPoints.https.tls]
[[entryPoints.https.tls.certificates]]
certFile = "/etc/certs/cert.pem"
keyFile = "/etc/certs/key.pem"
[entryPoints.http]
address = ":80"
[web]
address = ":8080"
[traefikLog]
[accessLog]
[consulCatalog]
endpoint = "127.0.0.1:8500"
domain = "consul.localhost"
exposedByDefault = false
prefix = "traefik"
The tags used for the consul service:
"traefik.enable=true",
"traefik.protocol=https",
"traefik.frontend.passHostHeader=true",
"traefik.frontend.redirect.entryPoint=https",
"traefik.frontend.entryPoints=https",
"traefik.frontend.rule=Host:hostname"
The debug output from traefik for each request:
time="2018-04-08T02:46:36Z"
level=debug
msg="vulcand/oxy/roundrobin/rr: begin ServeHttp on request"
Request="{"Method":"GET","URL":{"Scheme":"","Opaque":"","User":null,"Host":"","Path":"/","RawPath":"","ForceQuery":false,"RawQuery":"","Fragment":""},"Proto":"HTTP/1.1","ProtoMajor":1,"ProtoMinor":1,"Header":{"Accept":["text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8"],"Accept-Encoding":["gzip, deflate, br"],"Accept-Language":["en-US,en;q=0.9"],"Cache-Control":["max-age=0"],"Cookie":["__utmc=80117009; PHPSESSID=64c928bgf265fgqdqqbgdbuqso; _ga=GA1.2.573328135.1514428072; messagesUtk=d353002175524322ac26ff221d1e80a6; __hstc=27968611.cbdd9ce39324304b461d515d0a8f4cb0.1523037648547.1523037648547.1523037648547.1; __hssrc=1; hubspotutk=cbdd9ce39324304b461d515d0a8f4cb0; __utmz=80117009.1523037658.5.2.utmcsr=|utmccn=(referral)|utmcmd=referral|utmcct=/; __utma=80117009.573328135.1514428072.1523037658.1523128344.6"],"Upgrade-Insecure-Requests":["1"],"User-Agent":["Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.81 Safari/537.36"],"X-Amzn-Trace-Id":["Root=1-5ac982a8-b9615451a35258e3fd2a825d"],"X-Forwarded-For":["76.105.255.147"],"X-Forwarded-Port":["443"],"X-Forwarded-Proto":["https"]},"ContentLength":0,"TransferEncoding":null,"Host”:”hostname”,”Form":null,"PostForm":null,"MultipartForm":null,"Trailer":null,"RemoteAddr":"10.200.20.130:4880","RequestURI":"/","TLS":null}"
time="2018-04-08T02:46:36Z" level=debug
msg="vulcand/oxy/roundrobin/rr: Forwarding this request to URL"
Request="{"Method":"GET","URL":{"Scheme":"","Opaque":"","User":null,"Host":"","Path":"/","RawPath":"","ForceQuery":false,"RawQuery":"","Fragment":""},"Proto":"HTTP/1.1","ProtoMajor":1,"ProtoMinor":1,"Header":{"Accept":["text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8"],"Accept-Encoding":["gzip, deflate, br"],"Accept-Language":["en-US,en;q=0.9"],"Cache-Control":["max-age=0"],"Cookie":["__utmc=80117009; PHPSESSID=64c928bgf265fgqdqqbgdbuqso; _ga=GA1.2.573328135.1514428072; messagesUtk=d353002175524322ac26ff221d1e80a6; __hstc=27968611.cbdd9ce39324304b461d515d0a8f4cb0.1523037648547.1523037648547.1523037648547.1; __hssrc=1; hubspotutk=cbdd9ce39324304b461d515d0a8f4cb0; __utmz=80117009.1523037658.5.2.utmcsr=|utmccn=(referral)|utmcmd=referral|utmcct=/; __utma=80117009.573328135.1514428072.1523037658.1523128344.6"],"Upgrade-Insecure-Requests":["1"],"User-Agent":["Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.81 Safari/537.36"],"X-Amzn-Trace-Id":["Root=1-5ac982a8-b9615451a35258e3fd2a825d"],"X-Forwarded-For":["76.105.255.147"],"X-Forwarded-Port":["443"],"X-Forwarded-Proto":["https"]},"ContentLength":0,"TransferEncoding":null,"Host”:”hostname”,”Form":null,"PostForm":null,"MultipartForm":null,"Trailer":null,"RemoteAddr":"10.200.20.130:4880","RequestURI":"/","TLS":null}" ForwardURL="https://10.200.115.53:443"
assume "hostname" is the correct host name. Any assistance is appreciated.
I think your problem come from "traefik.protocol=https", remove this tag.
Also you can remove traefik.frontend.entryPoints=https because it's useless: this tag create a redirection to https entrypoint but your frontend is already on the https entry point ("traefik.frontend.entryPoints=https")

Python-Requests Post request failed with 403 Forbidden

I was trying to use python-requests to login https://www.custommade.com/, but it keeps giving me a "403 forbidden error". I got the post_url and content of payload from httpfox
import requests
post_url = 'https://www.custommade.com/secure/login/api/'
client = requests.session()
csrftoken = client.get('https://www.custommade.com/').cookies['csrftoken']
header_info = {
'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.71 Safari/537.36',
'Content-type': 'application/json'
}
payload = {'_method':'login','csrftoken': csrftoken,'email': MYEMAIL,'password':MYPWS}
r = client.post(post_url, json=payload, headers = header_info)
print r.status_code
Could someone help? I tried to login other website and this works fine.
If you would print the reponse text you are getting you would see that they print error to you that you are not accepting cookies.
When you are doing something like this - always try to simulate the browser as best as possible - that means you have to set-up all the headers and also do the steps the browser does.
So first open the webpage in your browser. Open the dev tools, network tab.
Now click on the login -> you see that the browser does request to the /secure/proxy
So your program has to do it too. Than to the actual request. Ensure that your request looks so much as the request from the browser - check the headers. You can see that they send the token there. (btw they do not send it in the post data as you did in your script). Also they are probably checking some other headers, because when you remove them - it doesn't work. So easiest way is to put all the headers as the browser.
Don't forget about the cookies. But this is done automatically because you are using session from requests.
Anyway this is working code:
import requests
post_url = 'https://www.custommade.com/secure/login/api/'
client = requests.session()
client.get('https://www.custommade.com/')
r = client.get('https://www.custommade.com/secure/proxy/')
csrftoken = r.cookies['csrftoken']
header_info = {
"Host" : "www.custommade.com",
"Connection" : " keep-alive",
"Origin" : " https://www.custommade.com",
"User-Agent" : " Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36",
"Content-Type" : " application/x-www-form-urlencoded",
"Accept" : " */*",
"X-Requested-With" : " XMLHttpRequest",
"X-CSRFToken" : csrftoken,
"DNT" : " 1",
"Referer" : " https://www.custommade.com/secure/proxy/",
"Accept-Encoding" : " gzip, deflate, br",
"Accept-Language" : " en-US,en;q=0.8,cs-CZ;q=0.6,cs;q=0.4,sk;q=0.2,ru;q=0.2",
}
payload = {'_method':'login','email': 'sdfasdf#safs.com','password':'asfdasf', 'remember':True}
r = client.post(post_url, data=payload, headers = header_info)
print r.text
print r.status_code
Print:
{"errors": "Oops! Something went wrong. Please ensure you are sending JSON data."}
400
^^ Means the password is wrong

How to solve 403 error in scrapy

I'm new to scrapy and I made the scrapy project to scrap data.
I'm trying to scrapy the data from the website but I'm getting following error logs
2016-08-29 14:07:57 [scrapy] INFO: Enabled item pipelines:
[]
2016-08-29 13:55:03 [scrapy] INFO: Spider opened
2016-08-29 13:55:03 [scrapy] INFO: Crawled 0 pages (at 0 pages/min),scraped 0 items (at 0 items/min)
2016-08-29 13:55:04 [scrapy] DEBUG: Crawled (403) <GET http://www.justdial.com/robots.txt> (referer: None)
2016-08-29 13:55:04 [scrapy] DEBUG: Crawled (403) <GET http://www.justdial.com/Mumbai/small-business> (referer: None)
2016-08-29 13:55:04 [scrapy] DEBUG: Ignoring response <403 http://www.justdial.com/Mumbai/small-business>: HTTP status code is not handled or not allowed
2016-08-29 13:55:04 [scrapy] INFO: Closing spider (finished)
I'm trying following command then on website console then I got the response but when I'm using same path inside python script then I got the error which I have described above.
Commands on web console:
$x('//div[#class="col-sm-5 col-xs-8 store-details sp-detail paddingR0"]/h4/span/a/text()')
$x('//div[#class="col-sm-5 col-xs-8 store-details sp-detail paddingR0"]/p[#class="contact-info"]/span/a/text()')
Please help me.
Thanks
Like Avihoo Mamka mentioned in the comment you need to provide some extra request headers to not get rejected by this website.
In this case it seems to just be the User-Agent header. By default scrapy identifies itself with user agent "Scrapy/{version}(+http://scrapy.org)". Some websites might reject this for one reason or another.
To avoid this just set headers parameter of your Request with a common user agent string:
headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:48.0) Gecko/20100101 Firefox/48.0'}
yield Request(url, headers=headers)
You can find a huge list of user-agents here, though you should stick with popular web-browser ones like Firefox, Chrome etc. for the best results
You can implement it to work with your spiders start_urls too:
class MySpider(scrapy.Spider):
name = "myspider"
start_urls = (
'http://scrapy.org',
)
def start_requests(self):
headers= {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:48.0) Gecko/20100101 Firefox/48.0'}
for url in self.start_urls:
yield Request(url, headers=headers)
Add the following script on your settings.py file. This works well if you are combining selenium with scrapy
headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:48.0) Gecko/20100101 Firefox/48.0'}
I just needed to get my shell to work and run some quick tests so Granitosaurus's solution was a bit overkill for me.
I literally just went to the settings.py where you'll find mostly everything is commented out. In like line 16-17 or something you'll find something like this...
# Crawl responsibly by identifying yourself (and your website) on the user-agent
#USER_AGENT = 'exercise01part01 (+http://www.yourdomain.com)'
You just need uncomment it and replace it with any user agent like 'Mozilla/5.0 (X11; Linux x86_64; rv:48.0) Gecko/20100101 Firefox/48.0'
You can find a list of them here https://www.useragentstring.com/pages/useragentstring.php[][1]
So it'll look something like this...
# Crawl responsibly by identifying yourself (and your website) on the user-agent
USER_AGENT = 'Mozilla/5.0 (X11; Linux x86_64; rv:48.0) Gecko/20100101 Firefox/48.0'
You'll definitely want to rotate user agents if you want to make a make a large-scale crawler. But I just needed to get my scrapy shell to work and make some quick tests without getting that pesky 403 error so this one-liner sufficed. It was nice because I did not need to make a fancy function or anything.
Happy scrapy-ing
Note: PLEASE make sure you are in the same directory as settings.py when you run scrapy shell in order to utilize the changes you just made. It does not work if you are in a parent directory.
How could the whole process of error solution look like:
You can find a huge list of user-agents at https://www.useragentstring.com/pages/useragentstring.php, though you should stick with popular web-browser ones like Firefox, Chrome etc. for the best results (find more at How to solve 403 error in scrapy).
An example of steps working for me for Windows 10 in scrapy shell follows:
https://www.useragentstring.com/pages/useragentstring.php -> choose 1 link from BROWSERS (but you also can try a link from links from CRAWLERS, ...) ->
e.g. Chrome = https://www.useragentstring.com/pages/Chrome/ -> choose 1 of lines, e.g.:
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36 -> choose 1 part (text that belongs together) from that line, e.g.: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) ->
Command Prompt -> go into the project folder -> scrapy shell
from scrapy import Request
req = Request('https://www.whiskyshop.com/scotch-whisky?item_availability=In+Stock', headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6)'})
fetch(req)
Now, the result should be 200.
You see that it works even though I am on Windows 10 and there is Macintosh in Request().
You can use previous steps also to add a chosen header to file "settings.py".
Notes 1: Also comments in following stackoverflow web pages are more or less related (and I use them for this example):
https://stackoverflow.com/questions/52196040/scrapy-shell-and-scrapyrt-got-403-but-scrapy-crawl-works,
https://stackoverflow.com/questions/16627227/problem-http-error-403-in-python-3-web-scraping,
https://stackoverflow.com/questions/37010524/set-headers-for-scrapy-shell-request
Notes 2: I also recommend to read e.g.:
https://scrapeops.io/web-scraping-playbook/403-forbidden-error-web-scraping/
https://scrapeops.io/python-scrapy-playbook/scrapy-managing-user-agents/
https://www.simplified.guide/scrapy/change-user-agent

Web crawler script will not run when accessing Internet from another source - Python

I have run into an issue where my web crawler will only run correctly when I am connected to my home Internet.
Using Python 2.7 with the Mechanize module on Windows 7.
Here are a few details about the code (snippet below)- This web crawler logs into a website, navigates through a series of links, locates a link to download a file, downloads the file, saves the file to a preset folder, then repeats the process several thousand times.
I am able to run the code successfully at home on both my wired and wireless internet. When I connect to the Internet via a different source (e.g. work, starbucks, neighbor's house, mobile hotspot) the script runs but returns an error when trying to access the link to download a file:
httperror_seek_wrapper: HTTP ERROR 404: Not Found
This is what the prints in the IDE when I access this site:
send: 'GET /download/8635/CLPOINT.E00.GZ HTTP/1.1\r\nHost: dl1.geocomm.com\r\nUser-Agent: Mozilla/5.0 (x11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1\r\nCookie: MSGPOPUP=1391465678; TBMSESSION=5dee7266e3dcfa0193972102c73a2543\r\nConnection: close\r\nAccept-Encoding: gzip\r\n\r\n'
reply: 'HTTP/1.1 404 Not Found\r\n'
header: Content-Type: text/html
header: Content-Length: 345
header: Connection: close
header: Date: Mon, 03 Feb 2014 22:14:44 GMT
header: Server: lighttpd/1.4.32
Simply changing back to my home internet What confuses me is I am not changing anything but the source of the internet - I simply disconnect from router, connect to another, and rerun the code.
I have tried to change the browser headers using these three options:
br.addheaders = [('User-agent', 'Mozilla/5.0 (x11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]
br.addheaders = [('User-agent', 'Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.97 Safari/537.11')]
br.addheaders = [('User-agent', 'Firefox')]
I am using the Mechanize module to access the Internet and create a browser session. Here is the login code snippet and download file code snippet (where I am getting the 404 error).
def websiteLogin():
## Logs into GeoComm website using predefined credential (username/password hardcoded in definition)
br = mechanize.Browser()
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)
br.set_handle_equiv(True)
br.set_handle_gzip(True)
br.set_handle_redirect(True)
br.set_handle_referer(True)
br.set_handle_robots(False)
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(),max_time=1)
br.set_debug_http(True)
br.set_debug_redirects(True)
br.set_debug_responses(False)
br.addheaders = [('User-agent', 'Mozilla/5.0 (x11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]
br.select_form(nr=0)
br.form['username']='**********' ## stars replace my actual un and pw
br.form['password']='**********'
br.submit()
return br
def downloadData (br, url, outws):
br.open(url)
for l in br.links(url_regex = 'download/[0-9]{4}'):
fname = l.text
outfile = os.path.join(outws, fname)
if not os.path.exists(outfile):
f = br.retrieve(l.absolute_url)[0]
time.sleep(7.5)
shutil.copy2(f, outfile)
This code does run as expected (i.e. downloads files without 404 error) on my home internet, but that is a satellite internet service and my daily download and monthly data allotments are limited - that is why I need to run this using another source of internet. I am looking for some help better understanding why the code runs one place but not another. Let me know if you require more information to help troubleshoot this.
As you can see from your get-request your mechanize browser object is trying to get the resource /download/8635/CLPOINT.E00.GZ from host dl1.geocomm.com.
When you try to recheck this you will get the 404 because the resource is simply not available.
dl1.geocomm.com is redirected to another target
What I'd recommend you to do is to start debugging your application in an appropriate way.
You could start with adding at least some debugging print statements.
def downloadData (br, url, outws):
br.open(url)
for l in br.links(url_regex = 'download/[0-9]{4}'):
print(l.url)
After that you'll see how the output differs. Ensure to pass the url in the same way every time.