How to use requests.cookies in pdftkit/wkhtmltopdf? - python-2.7

I like to print a pdf-version of my mediawikipage using pdfkit.
My mediawiki requires a valid login to see any pages.
I login to mediawiki using requests, and this works, and I get some cookies. However, I am not able to use these cookies with pdfkit.from_url()
My python-script looks like this:
#!/usr/bin/env python2
import pdfkit
import requests
import pickle
mywiki = "http://192.168.0.4/produniswiki/"# URL
username = 'produnis' # Username to login with
password = 'seeeecret#' # Login Password
## Login to MediaWiki
# Login request
payload = {'action': 'query', 'format': 'json', 'utf8': '', 'meta': 'tokens', 'type': 'login'}
r1 = requests.post(mywiki + 'api.php', data=payload)
# login confirm
login_token = r1.json()['query']['tokens']['logintoken']
payload = {'action': 'login', 'format': 'json', 'utf8': '', 'lgname': username, 'lgpassword': password, 'lgtoken': login_token}
r2 = requests.post(mywiki + 'api.php', data=payload, cookies=r1.cookies)
print(r2.cookies)
So, right here I am successfully logged in, and cookies are stored in r2.cookies.
The print()-command gives:
<RequestsCookieJar[<Cookie produniswikiToken=832a1f1da165016fb9d9a107ddb218fc for 192.168.0.4/>, <Cookie produniswikiUserID=1 for 192.168.0.4/>, <Cookie produniswikiUserName=Produnis for 192.168.0.4/>, <Cookie produniswiki_session=oddicobpi1d5af4n0qs71g7dg1kklmbo for 192.168.0.4/>]>
I can save the cookies into a file:
def save_cookies(requests_cookiejar, filename):
with open(filename, 'wb') as f:
pickle.dump(requests_cookiejar, f)
save_cookies(r2.cookies, "cookies")
This file looks like this: http://pastebin.com/yKyCpPTW
Now I want to print a specific page into PDF using pdfkit. Manpage states, that cookies can be set via a cookie-jar file:
options = {
'page-size': 'A4',
'margin-top': '0.5in',
'margin-right': '0.5in',
'margin-bottom': '0.5in',
'margin-left': '0.5in',
'encoding': "UTF-8",
'cookie-jar' : "cookies",
'no-outline': None
}
current_pdf = pdfkit.from_url(pdf_url, the_filename, options=options)
My Problem is:
with this code, the "cookies" file becomes 0KB and the PDF states "You must be logged in to view a page..."
So my question is:
How can I use a requests.cookies in pdfkit.from_url()?

I had the same issue and overcame it with the following:
import requests, pdfkit
# Get login cookie
s = requests.session() # if you're making multiple calls
data = {'username': 'admin', 'password': 'hunter2'}
s.post('http://example.com/login', data=data)
# Get yourself a PDF
options = {'cookie': s.cookies.items(), 'javascript-delay': 1000}
pdfkit.from_url('http://example.com/report', 'report.pdf', options=options)
Depending on how much javascript you're trying to load you might want to set the javascript-delay to something higher or lower; the default is 200ms.

Related

Creating a Apache superset user through API

I want to create user through apache superset API if user already exist it should show message.
import requests
payload = { 'username': 'My user name',
'password': 'my Password',
'provider': 'db'
}
base_url="http://0.0.0.0:8088"
login_url =f"{base_url}/api/v1/security/login"
login_response =requests.post(login_url,json=payload)
access_token = login_response.json()
print(access_token)
# header
header = {"Authorization":f"Bearer {access_token.get('access_token')}"}
base_url="http://0.0.0.0:8088" # apache superset is running in my local system on port 8088
user_payload={
'first_name': "Cp",
'last_name': "user",
'username':"A",
'email': "user.#gmail.com",
'active': True,
'conf_password': "user#1",
'password': "user#1",
"roles":["Dashboard View"]
}
users_url = f"{base_url}/users/add"
user_response= requests.get(users_url, headers=header)
print(user_response)
print(user_response.text)
user_response = user_response.json()
print(user_response)
I tried to create user through mentioned API, some time it gives 200 success message, but after checking in apache superset website i can not see created user details.
If i am using the wrong API endpoint or wrong methods to create user please suggest me.

How to get Superset Token?? (for use Rest api)

I attempted to request a REST request to see the document below. But do not work. https://superset.apache.org/docs/rest-api
request: curl -XGET -L http://[IP:PORT]/api/v1/chart
response: {"msg":"Bad Authorization header. Expected value 'Bearer <JWT>'"}
The Superset installation has been on PIP and was also Helm Chart. But all are the same. helm: https://github.com/apache/superset
How should I order a REST API?
Check the security section of the documentation you have linked. It has this API ​/security​/login, you can follow the JSON parameter format and get the JWT bearer token. Use that token to send in the Header of your other API calls to superset.
open http://localhost:8080/swagger/v1, assuming http://localhost:8080 is your Superset host address
then find this section
the response would be like this
{
"access_token": "eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJmcmVzaCI6dHJ1ZSwiaWF0IjoxNjU0MzQ2OTM5LCJqdGkiOiJlZGY2NTUxMC0xMzI1LTQ0NDEtYmFmMi02MDc1MzhjZDcwNGYiLCJ0eXBlIjoiYWNjZXNzIiwic3ViIjoxLCJuYmYiOjE2NTQzNDY5MzksImV4cCI6MTY1NDM0NzgzOX0.TfjUea3ycH77xhCWOpO4LFbYHrT28Y8dnWsc1xS_IOY",
"refresh_token": "eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJmcmVzaCI6ZmFsc2UsImlhdCI6MTY1NDM0NjkzOSwianRpIjoiNzBiM2EyZDYtNDFlNy00ZDNlLWE0NDQtMTRiNTkyNTk4NjUwIiwidHlwZSI6InJlZnJlc2giLCJzdWIiOjEsIm5iZiI6MTY1NDM0NjkzOSwiZXhwIjoxNjU2OTM4OTM5fQ.OgcctNnO4zTDfTgtHnaEshk7u-D6wOxfxjCsjqjKYyE"
}
Thank #andrewsali commented on this github issue, I finally figure out how to access the superset REST API by python code.
import requests
from bs4 import BeautifulSoup
import json
def get_supetset_session():
"""
# http://192.168.100.120:8088/swagger/v1
url = f'http://{superset_host}/api/v1/chart/'
r = s.get(url)
# print(r.json())
"""
superset_host = '192.168.100.120:8088' # replace with your own host
username = 'YOUR_NAME'
password = 'YOUR_PASSWORD'
# set up session for auth
s = requests.Session()
login_form = s.post(f"http://{superset_host}/login")
# get Cross-Site Request Forgery protection token
soup = BeautifulSoup(login_form.text, 'html.parser')
csrf_token = soup.find('input',{'id':'csrf_token'})['value']
data = {
'username': username,
'password': password,
'csrf_token':csrf_token
}
# login the given session
s.post(f'http://{superset_host}/login/', data=data)
print(dict(s.cookies))
return s
DEMO
# s = get_supetset_session()
base_url = 'http://192.168.100.120:8088'
def get_dashboards_list(s, base_url=base_url):
"""## GET List of Dashboards"""
url = base_url + '/api/v1/dashboard/'
r = s.get(url)
resp_dashboard = r.json()
for result in resp_dashboard['result']:
print(result['dashboard_title'], result['id'])
s = get_supetset_session()
# {'session': '.eJwlj8FqAzEMRP_F5z1Islay8jOLJcu0NDSwm5xK_r0uPQ7DG978lGOeeX2U2_N85VaOz1FuxVK6JIHu1QFhGuEOk5NG8qiYGkJ7rR3_Ym-uJMOzJqySeHhIG8SkNQK6GVhTdLf0ZMmG6sZGQtiQ1Gz0qYiUTVoHhohZthLXOY_n4yu_l0-VKTObLaE13i2Hz2A2rzBmhU7WkkN1cfdH9HsuZoFbeV15_l_C8v4F4nBC9A.Ypn16Q.yz4E-vz0gp3EmJwv-6tYIcOGavU'}
get_dashboards_list(s)
Thanks #Ferris for this visual solution!
To add to this, you can also create the appropriate API call with Python just like following:
import requests
api_url = "your_url/api/v1/security/login"
payload = {"password":"your password",
"provider":"db",
"refresh":True,
"username":"your username"
}
response = requests.post(api_url, json=payload)
# the acc_token is a json, which holds access_token and refresh_token
access_token = response.json()['access_token']
# no get a guest token
api_url_for_guesttoken = "your_url/api/v1/security/guest_token"
payload = {}
# now this is the crucial part: add the specific auth-header
response = request.post(api_url_for_guesttoken , json=payload, headers={'Authorization':f"Bearer {access_token}"})

Cannot download html (entire web page)

I am trying to download the entire html code from
http://www.ivolatility.com/options/AMZN/NASDAQ/
The output does not include the data in the tables.
This is the code I am using
url = 'http://www.ivolatility.com/options/AMZN/NASDAQ/'
r = requests.get(url, allow_redirects=True)
open('C:.../Downloads/amzn.html', 'wb').write(r.content)
I think it might be related to registration issues.
Anything I can do?
Thanks
Your request returns a login form, which means you'll have to login in order to access the data.
The login process is relatively easy - all we have to do is submit the form data to the login page (and use a session object to store the cookies).
Then we can use that authenticated session to retrieve the table contents.
The code,
import requests
url = 'http://www.ivolatility.com/options/AMZN/NASDAQ/'
login_url = 'https://www.ivolatility.com/login.j'
usr = 'my username'
pwd = 'my password'
data = {
'username':usr, 'password':pwd,
'ref_url':login_url, 'service_name':'Home Page',
'step':1, 'login__is__sent':1
}
s = requests.session()
s.post(login_url, data)
r = s.get(url)
with open('my file', 'wb') as f:
f.write(r.content)

Loop through payload Python

There is a site that i connect to, but need to login 4 times with different user names and passwords.
Is there anyway that i can do this by looping through the usernames and passwords in a payload.
This is the first time im am doing this and am not really sure of how to go about it.
The code works fine if i post just one username and password.
Im using Python 2.7 and BeautifulSoup and requests.
Here is my code.
import requests
import zipfile, StringIO
from bs4 import BeautifulSoup
# Here were add the login details to be submitted to the login form.
payload = [
{'USERNAME': 'xxxxxx','PASSWORD': 'xxxxxx','option': 'login'},
{'USERNAME': 'xxxxxx','PASSWORD': 'xxxxxxx','option': 'login'},
{'USERNAME': 'xxxxx','PASSWORD': 'xxxxx','option': 'login'},
{'USERNAME': 'xxxxxx','PASSWORD': 'xxxxxx','option': 'login'},
]
#Possibly need headers later.
headers = {'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36'}
base_url = "https://service.rl360.com/scripts/customer.cgi/SC/servicing/"
with requests.Session() as s:
p = s.post('https://service.rl360.com/scripts/customer.cgi?option=login', data=payload)
# Get the download page to scrape.
r = s.get('https://service.rl360.com/scripts/customer.cgi/SC/servicing/downloads.php?Folder=DataDownloads&SortField=ExpiryDays&SortOrder=Ascending', stream=True)
content = r.text
soup = BeautifulSoup(content, 'lxml')
#Now i get the most recent download URL.
download_url = soup.find_all("a", {'class':'tabletd'})[-1]['href']
#now we join the base url with the download url.
download_docs = s.get(base_url + download_url, stream=True)
print "Checking Content"
content_type = download_docs.headers['content-type']
print content_type
print "Checking Filename"
content_name = download_docs.headers['content-disposition']
print content_name
print "Checking Download Size"
content_size = download_docs.headers['content-length']
print content_size
#This is where we extract and download the specified xml files.
z = zipfile.ZipFile(StringIO.StringIO(download_docs.content))
print "---------------------------------"
print "Downloading........."
#Now we save the files to the specified location.
z.extractall('C:\Temp')
print "Download Complete"
Just use a for loop. You may need to adjust your download directory if files will be overwritten.
payloads = [
{'USERNAME': 'xxxxxx1','PASSWORD': 'xxxxxx','option': 'login'},
{'USERNAME': 'xxxxxx2','PASSWORD': 'xxxxxxx','option': 'login'},
{'USERNAME': 'xxxxx3','PASSWORD': 'xxxxx','option': 'login'},
{'USERNAME': 'xxxxxx4','PASSWORD': 'xxxxxx','option': 'login'},
]
....
for payload in payloads:
with requests.Session() as s:
p = s.post('https://service.rl360.com/scripts/customer.cgi?option=login', data=payload)
...

logging into github with python

I have been up and down these pages looking for how to do this and there are many similar posts but I can't seem to get it to work, so I find myself having to ask specifically how to do this.
I am trying to gather metrics about my software project in git hub. For many of these metrics you can use the API. However, one of the most interesting items are the unique visitors and view count on the github graphs/traffic and unfortunately this info is not located in the Github API. So, to get this I am trying to log into my github account navigate to the site then get the numbers. Located below is my code. I can't seem to get logged into github to do anything however (my url request continues to show a login page rather then the traffic page). I think it probably has something to do with the variables that need to be posted but I'm not sure whats wrong with them.
from requests import session
from bs4 import BeautifulSoup as bs
USER = 'MYID'
PASSWORD = 'MYPASSWORD'
URL1 = 'https://github.com/login'
URL2 = 'https://github.com/MYPROJ/graphs/traffic'
with session() as s:
req = s.get(URL1).text
html = bs(req)
token = html.find("input", {"name": "authenticity_token"}).attrs['value']
com_val = html.find("input", {"name": "commit"}).attrs['value']
login_data = {'login_field': USER,
'password': PASSWORD,
'authenticity_token' : token,
'commit' : com_val}
r1 = s.post(URL1, data = login_data)
r2 = s.get(URL2)
print(r2.url)
print bs(r2.text).find('span', {'class':'num js-uniques uniques'})
Any help is appreciated.
Thanks,
-Jeff
Figured it out.
I was using the wrong address to post my login and username, as well as some other wrong bits.
This is the updated code that worked for me:
from requests import session
from bs4 import BeautifulSoup as bs
USER = 'MyUserName'
PASSWORD = 'Mypassword'
URL1 = 'https://github.com/session'
URL2 = 'https://github.com/MyProj/graphs/traffic-data'
with session() as s:
req = s.get(URL1).text
html = bs(req)
token = html.find("input", {"name": "authenticity_token"}).attrs['value']
com_val = html.find("input", {"name": "commit"}).attrs['value']
login_data = {'login': USER,
'password': PASSWORD,
'commit' : com_val,
'authenticity_token' : token}
r1 = s.post(URL1, data = login_data)
r2 = s.get(URL2)
Cut1 = r2.text.split(',"summary":{"total":',2)
ViewsTot = Cut1[1].split(',"unique":',1)
ViewsUnq = ViewsTot[1].split('}}',1)