I was trying to use python-requests to login https://www.custommade.com/, but it keeps giving me a "403 forbidden error". I got the post_url and content of payload from httpfox
import requests
post_url = 'https://www.custommade.com/secure/login/api/'
client = requests.session()
csrftoken = client.get('https://www.custommade.com/').cookies['csrftoken']
header_info = {
'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.71 Safari/537.36',
'Content-type': 'application/json'
}
payload = {'_method':'login','csrftoken': csrftoken,'email': MYEMAIL,'password':MYPWS}
r = client.post(post_url, json=payload, headers = header_info)
print r.status_code
Could someone help? I tried to login other website and this works fine.
If you would print the reponse text you are getting you would see that they print error to you that you are not accepting cookies.
When you are doing something like this - always try to simulate the browser as best as possible - that means you have to set-up all the headers and also do the steps the browser does.
So first open the webpage in your browser. Open the dev tools, network tab.
Now click on the login -> you see that the browser does request to the /secure/proxy
So your program has to do it too. Than to the actual request. Ensure that your request looks so much as the request from the browser - check the headers. You can see that they send the token there. (btw they do not send it in the post data as you did in your script). Also they are probably checking some other headers, because when you remove them - it doesn't work. So easiest way is to put all the headers as the browser.
Don't forget about the cookies. But this is done automatically because you are using session from requests.
Anyway this is working code:
import requests
post_url = 'https://www.custommade.com/secure/login/api/'
client = requests.session()
client.get('https://www.custommade.com/')
r = client.get('https://www.custommade.com/secure/proxy/')
csrftoken = r.cookies['csrftoken']
header_info = {
"Host" : "www.custommade.com",
"Connection" : " keep-alive",
"Origin" : " https://www.custommade.com",
"User-Agent" : " Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36",
"Content-Type" : " application/x-www-form-urlencoded",
"Accept" : " */*",
"X-Requested-With" : " XMLHttpRequest",
"X-CSRFToken" : csrftoken,
"DNT" : " 1",
"Referer" : " https://www.custommade.com/secure/proxy/",
"Accept-Encoding" : " gzip, deflate, br",
"Accept-Language" : " en-US,en;q=0.8,cs-CZ;q=0.6,cs;q=0.4,sk;q=0.2,ru;q=0.2",
}
payload = {'_method':'login','email': 'sdfasdf#safs.com','password':'asfdasf', 'remember':True}
r = client.post(post_url, data=payload, headers = header_info)
print r.text
print r.status_code
Print:
{"errors": "Oops! Something went wrong. Please ensure you are sending JSON data."}
400
^^ Means the password is wrong
Related
Starting from few days ago, I've started to receive BadRequestException with Unexpected end of JSON input
This is the trace from NestJS:
mapExternalException(err) {
switch (true) {
// SyntaxError is thrown by Express body-parser when given invalid JSON (#422, #430)
// URIError is thrown by Express when given a path parameter with an invalid percentage
// encoding, e.g. '%FF' (#8915)
case err instanceof SyntaxError || err instanceof URIError:
return new common_1.BadRequestException(err.message); // <--- throws here
default:
return err;
}
}
Application is hosted on AWS and what's weird to me is the url from the request that you can see from sentry log:
{
data: {},
headers: {
accept-encoding: gzip,
content-length: 1024,
content-type: application/json,
host: IP_THAT_SEEMS_FROM_AMAZON,
user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36,
x-amzn-trace-id: Root=1-some-trace-id,
x-forwarded-for: SOME_IP,
x-forwarded-port: 443,
x-forwarded-proto: https
},
method: POST,
query_string: {},
url: https://IP_THAT_SEEMS_FROM_AMAZON/api/report
}
Is it possible that AWS is doing internal requests to /api/report endpoint, and if e.g. HTML is returned application throws Unexpected end of JSON input (because that's usual reason for that error when making requests)?
If anyone is familiar with what's happening, any help would be appreciated!
Let me start this with, I do not know python, I've had maybe 1 day going through the python tutorials. The situation is this. I have an angular app that has a python app hosted with Apache on a vm in an iframe. I didn't write the python app, but another developer wrote me an endpoint where I am supposed to be able to post from my angular app.
The developer who made the python endpoint is saying that there is something wrong with my request but I am fairly certain there isn't anything wrong. I am almost 100% certain that the problem is that there are no CORS headers in the response and/or the response is not set up to respond to the OPTIONS method. Below is the entirety of the python endpoint:
import os, site, inspect
site.addsitedir(os.path.dirname(os.path.abspath(inspect.getfile(inspect.currentframe())))+"/../")
import json
from datetime import datetime
import pymongo
from Config import Config
def application(environ, start_response):
response = environ['wsgi.input'].read(int(environ['CONTENT_LENGTH']))
if response:
json_response = json.loads(response)
document = {
'payment_id': json_response['payment_id'],
'log': json_response['log'],
'login_id': json_response['login_id'],
'browser': environ.get('HTTP_USER_AGENT', None),
'ip_address': environ.get('REMOTE_ADDR', None),
'created_at': datetime.utcnow(),
}
client = pymongo.MongoClient(Config.getValue('MongoServer'))
db = client.updatepromise
db.PaymentLogs.insert(document)
start_response('200 OK', [('Content-Type', 'application/json')
return '{"success": true}'
start_response('400 Bad Request', [('Content-Type', 'application/json')])
return '{"success": false}'
I have attempted the following to make this work: I added to both start_response functions more headers so the code looks like this now:
start_response('201 OK', [('Content-Type', 'application/json',
('Access-Control-Allow-Headers','authorization'),
('Access-Control-Allow-Methods','HEAD, GET, POST, PUT, PATCH, DELETE'),
('Access-Control-Allow-Origin','*'),
('Access-Control-Max-Age','600'))])
Not: I did this both with the 200 and the 400 response at first, and saw no change at all in the response, then just for the heck of it, I decided to change the 200 to a 201, this also did not come through on the response so I suspect this code isn't even getting run for some reason.
Please help, python newb here.
Addendum, i figured this would help, here is what the Headers look like in the response:
General:
Request URL: http://rpc.local/api/payment_log_api.py
Request Method: OPTIONS
Status Code: 200 OK
Remote Address: 10.1.20.233:80
Referrer Policy: no-referrer-when-downgrade
Response Headers:
Allow: GET,HEAD,POST,OPTIONS
Connection: Keep-Alive
Content-Length: 0
Content-Type: text/x-python
Date: Fri, 27 Apr 2018 15:18:55 GMT
Keep-Alive: timeout=5, max=100
Server: Apache/2.4.18 (Ubuntu)
Request Headers:
Accept: */*
Accept-Encoding: gzip, deflate
Accept-Language: en-US,en;q=0.9
Access-Control-Request-Headers: authorization,content-type
Access-Control-Request-Method: POST
Connection: keep-alive
Host: rpc.local
Origin: http://10.1.20.61:4200
User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36
Here it is. Just add this to the application right at the beginning:
def application(environ, start_response):
if environ['REQUEST_METHOD'] == 'OPTIONS':
start_response(
'200 OK',
[
('Content-Type', 'application/json'),
('Access-Control-Allow-Origin', '*'),
('Access-Control-Allow-Headers', 'Authorization, Content-Type'),
('Access-Control-Allow-Methods', 'POST'),
]
)
return ''
For Python with CGI, I found this to work:
print '''Access-Control-Allow-Origin: *\r\n''',
print '''Content-Type: text/html\r\n'''
Don't forget to enable CORS on the other side as well, e.g., JavaScript jQuery:
$.ajax({ url: URL,
type: "GET",
crossDomain: true,
dataType: "text", etc, etc
Firstly, I am aware of the solution presented here: JIRA REST API to get work log - "You do not have the permission to see the specified issue" ... As well the linked blog post (penned by #Nick Josevski), which whilst useful, doesn't address my specifc problem, which could be something trivial...
With following Python 2.7.3 code...
import requests
import getpass
import json
jira_user = raw_input("Username: ")
jira_pass = getpass.getpass()
session = requests.Session()
session.verify = jira_ca_certs # Our internal certs
auth_info = {"username": jira_user, "password": jira_pass}
login_url = 'http://JIRA_SERVER.com/login.jsp'
session.post(login_url, data=auth_info)
I generate the cookies post basic authenticatation to jira (note: I am using "http" without specifying the PORT to authenticate with the login page).. As session automatically holds the returned cookies, I can use session.cookies to set the header:
cookies = requests.utils.dict_from_cookiejar(session.cookies)
headers = {'Content-type': 'application/json', 'cookie': cookies}
Following which, I test the captured cookies with a basic get to the secure JIRA url, using https + PORT:
base = session.get('https://JIRA_SERVER.com:1234', headers=headers)
print 'base: ', base
The above, as expected, returns (though this might not be a valid test?) ...
base: <Response [200]>
Now to test the code for it's intended purpose. I extend the url for a specific JIRA issue, using the same approach:
jira = session.get('https://JIRA_SERVER.com:1234/rest/api/latest/issue/KRYP-6207', headers=headers)
print 'issue: ', jira
print jira.json()
With JSON output, I get a response stating I do not have the permission:
issue: <Response [401]>
{u'errorMessages': [u'You do not have the permission to see the specified issue.', u'Login Required'], u'errors': {}}
The cookies returned, that I use in the header, are:
headers: {'cookie': 'atlassian.xsrf.token=XXXXXXXXXXXXXXXX|lout; Path=/, JSESSIONID=XXXXXXXXXXXXXXXX; Path=/'}
I don't know why this works for the base url, but not the issue url.. I have used Chrome POSTMAN to check the cookies being returned, and they are the same as those listed above i.e. atlassian.xsrf.token, and JSESSIONID.
Hoping someone here can tell me what I am doing wrong! Thanks in advance ...
Maybe you are not correctly logged in? based on the requests I saw in webbrowser developer tools, I created following requests and it worked with our JIRA:
import requests
import getpass
import json
jira_user = raw_input("Username: ")
jira_pass = getpass.getpass()
session = requests.Session()
headers = {
'Host': 'our.jira.com',
'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:56.0) Gecko/20100101 Firefox/56.0',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.5',
'Referer': 'https://our.jira.com/login.jsp',
'Content-Type': 'application/x-www-form-urlencoded',
'Connection': 'keep-alive',
'Upgrade-Insecure-Requests': '1',
}
data = [
('os_username', jira_user),
('os_password', jira_pass),
('os_destination', ''),
('user_role', ''),
('atl_token', ''),
('login', 'Anmelden'),
]
loginPost = session.post('https://our.jira.com/login.jsp', headers=headers, data=data)
xsrf_token = session.cookies.get_dict()['atlassian.xsrf.token']
jsessionid = session.cookies.get_dict()['JSESSIONID']
cookies = {
'atlassian.xsrf.token': xsrf_token,
'JSESSIONID': jsessionid,
}
headers = {
'Host': 'our.jira.com',
'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:56.0) Gecko/20100101 Firefox/56.0',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.5',
'Connection': 'keep-alive',
'Upgrade-Insecure-Requests': '1',
}
r = session.get('https://our.jira.com/rest/api/latest/issue/SDN-206', headers=headers, cookies=cookies)
print r.json()
I am using scrapy for a scrapying project with this url https://www.walmart.ca/en/clothing-shoes-accessories/men/mens-tops/N-2566+11
I tried to play with the url and open it in the shell, but it got 430 error, so i added some settings to the header like that:
scrapy shell -s COOKIES_ENABLED=1 -s USER_AGENT='Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:46.0) Gecko/20100101 Firefox/46.0' "https://www.walmart.ca/en/clothing-shoes-accessories/men/mens-tops/N-2566+11"
it got the page "200", but once i use view(response), it directed me to a page that say:
Sorry!
Your web browser is not accepting cookies.
here is a screenshot of the log:
You should have
COOKIES_ENABLED = True
in your settings.py file.
Also see
COOKIES_DEBUG = True
To debug cookies, you will see what cookies are coming/outgoing which each response/request respectively.
If the web page requires to click to accept cookies, you can use FormRequest.from_response
Here is an example with Google consent page
def start_requests(self):
yield Request(
"https://google.com/",
callback=self.parse_consent,
)
def parse_consent(self, response):
yield FormRequest.from_response(
response,
clickdata={"value": "I agree"},
callback=self.parse_query,
dont_filter=True,
)
def parse_query(self, response):
for keyword in self.keywords:
yield Request(
<google_url_to_parse>,
callback=<your_callback>,
dont_filter=True,
)
Note that the value of clickdata may defer base on your location/language, you should change "I agree" to a correct value.
Try to send all required headers.
headers = {
'dnt': '1',
'accept-encoding': 'gzip, deflate, sdch, br',
'accept-language': 'en-US,en;q=0.8',
'upgrade-insecure-requests': '1',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36',
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'cache-control': 'max-age=0',
'authority': 'www.walmart.ca',
'cookie': 'JSESSIONID=E227789DA426B03664F0F5C80412C0BB.restapp-108799501-8-112264256; cookieLanguageType=en; deliveryCatchment=2000; marketCatchment=2001; zone=2; originalHttpReferer=; walmart.shippingPostalCode=V5M2G7; defaultNearestStoreId=1015; walmart.csrf=6f635f71ab4ae4479b8e959feb4f3e81d0ac9d91-1497631184063-441217ff1a8e4a311c2f9872; wmt.c=0; userSegment=50-percent; akaau_P1=1497632984~id=bb3add0313e0873cf64b5e0a73e3f5e3; wmt.breakpoint=d; TBV=7; ENV=ak-dal-prod; AMCV_C4C6370453309C960A490D44%40AdobeOrg=793872103%7CMCIDTS%7C17334',
'referer': 'https://www.walmart.ca/en/clothing-shoes-accessories/men/mens-tops/N-2566+11',
}
yield Request(url = 'https://www.walmart.ca/en/clothing-shoes-accessories/men/mens-tops/N-2566+11', headers=headers)
You can implement in your way like this, instead of using start_urls i would recommend start_requests() method. Its easy to read.
class EasySpider(CrawlSpider):
name = 'easy'
def start_requests(self):
headers = {
'dnt': '1',
'accept-encoding': 'gzip, deflate, sdch, br',
'accept-language': 'en-US,en;q=0.8',
'upgrade-insecure-requests': '1',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36',
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'cache-control': 'max-age=0',
'authority': 'www.walmart.ca',
'cookie': 'JSESSIONID=E227789DA426B03664F0F5C80412C0BB.restapp-108799501-8-112264256; cookieLanguageType=en; deliveryCatchment=2000; marketCatchment=2001; zone=2; originalHttpReferer=; walmart.shippingPostalCode=V5M2G7; defaultNearestStoreId=1015; walmart.csrf=6f635f71ab4ae4479b8e959feb4f3e81d0ac9d91-1497631184063-441217ff1a8e4a311c2f9872; wmt.c=0; userSegment=50-percent; akaau_P1=1497632984~id=bb3add0313e0873cf64b5e0a73e3f5e3; wmt.breakpoint=d; TBV=7; ENV=ak-dal-prod; AMCV_C4C6370453309C960A490D44%40AdobeOrg=793872103%7CMCIDTS%7C17334',
'referer': 'https://www.walmart.ca/en/clothing-shoes-accessories/men/mens-tops/N-2566+11',
}
yield Request(url = 'https://www.walmart.ca/en/clothing-shoes-accessories/men/mens-tops/N-2566+11', callback = self.parse_item, headers = headers)
def parse_item(self, response):
i = CravlingItem()
i['title'] = " ".join( response.xpath('//a/text()').extract()).strip()
yield i
I can confirm that COOKIES_ENABLED setting does not help in fixing the error.
Instead, using the following googlebot USER_AGENT made it work:
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Googlebot/2.1; http://www.google.com/bot.html) Chrome/W.X.Y.Z‡ Safari/537.36
I figured this out thanks to the person who made this script, which uses that User Agent to make the requests: https://github.com/juansimon27/scrapy-walmart/blob/master/product_scraping/spiders/spider.py
I have run into an issue where my web crawler will only run correctly when I am connected to my home Internet.
Using Python 2.7 with the Mechanize module on Windows 7.
Here are a few details about the code (snippet below)- This web crawler logs into a website, navigates through a series of links, locates a link to download a file, downloads the file, saves the file to a preset folder, then repeats the process several thousand times.
I am able to run the code successfully at home on both my wired and wireless internet. When I connect to the Internet via a different source (e.g. work, starbucks, neighbor's house, mobile hotspot) the script runs but returns an error when trying to access the link to download a file:
httperror_seek_wrapper: HTTP ERROR 404: Not Found
This is what the prints in the IDE when I access this site:
send: 'GET /download/8635/CLPOINT.E00.GZ HTTP/1.1\r\nHost: dl1.geocomm.com\r\nUser-Agent: Mozilla/5.0 (x11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1\r\nCookie: MSGPOPUP=1391465678; TBMSESSION=5dee7266e3dcfa0193972102c73a2543\r\nConnection: close\r\nAccept-Encoding: gzip\r\n\r\n'
reply: 'HTTP/1.1 404 Not Found\r\n'
header: Content-Type: text/html
header: Content-Length: 345
header: Connection: close
header: Date: Mon, 03 Feb 2014 22:14:44 GMT
header: Server: lighttpd/1.4.32
Simply changing back to my home internet What confuses me is I am not changing anything but the source of the internet - I simply disconnect from router, connect to another, and rerun the code.
I have tried to change the browser headers using these three options:
br.addheaders = [('User-agent', 'Mozilla/5.0 (x11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]
br.addheaders = [('User-agent', 'Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.97 Safari/537.11')]
br.addheaders = [('User-agent', 'Firefox')]
I am using the Mechanize module to access the Internet and create a browser session. Here is the login code snippet and download file code snippet (where I am getting the 404 error).
def websiteLogin():
## Logs into GeoComm website using predefined credential (username/password hardcoded in definition)
br = mechanize.Browser()
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)
br.set_handle_equiv(True)
br.set_handle_gzip(True)
br.set_handle_redirect(True)
br.set_handle_referer(True)
br.set_handle_robots(False)
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(),max_time=1)
br.set_debug_http(True)
br.set_debug_redirects(True)
br.set_debug_responses(False)
br.addheaders = [('User-agent', 'Mozilla/5.0 (x11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]
br.select_form(nr=0)
br.form['username']='**********' ## stars replace my actual un and pw
br.form['password']='**********'
br.submit()
return br
def downloadData (br, url, outws):
br.open(url)
for l in br.links(url_regex = 'download/[0-9]{4}'):
fname = l.text
outfile = os.path.join(outws, fname)
if not os.path.exists(outfile):
f = br.retrieve(l.absolute_url)[0]
time.sleep(7.5)
shutil.copy2(f, outfile)
This code does run as expected (i.e. downloads files without 404 error) on my home internet, but that is a satellite internet service and my daily download and monthly data allotments are limited - that is why I need to run this using another source of internet. I am looking for some help better understanding why the code runs one place but not another. Let me know if you require more information to help troubleshoot this.
As you can see from your get-request your mechanize browser object is trying to get the resource /download/8635/CLPOINT.E00.GZ from host dl1.geocomm.com.
When you try to recheck this you will get the 404 because the resource is simply not available.
dl1.geocomm.com is redirected to another target
What I'd recommend you to do is to start debugging your application in an appropriate way.
You could start with adding at least some debugging print statements.
def downloadData (br, url, outws):
br.open(url)
for l in br.links(url_regex = 'download/[0-9]{4}'):
print(l.url)
After that you'll see how the output differs. Ensure to pass the url in the same way every time.