Cannot download html (entire web page)

Cannot download html (entire web page) - python-2.7

I am trying to download the entire html code from
http://www.ivolatility.com/options/AMZN/NASDAQ/
The output does not include the data in the tables.
This is the code I am using
url = 'http://www.ivolatility.com/options/AMZN/NASDAQ/'
r = requests.get(url, allow_redirects=True)
open('C:.../Downloads/amzn.html', 'wb').write(r.content)
I think it might be related to registration issues.
Anything I can do?
Thanks

Your request returns a login form, which means you'll have to login in order to access the data.
The login process is relatively easy - all we have to do is submit the form data to the login page (and use a session object to store the cookies).
Then we can use that authenticated session to retrieve the table contents.
The code,
import requests
url = 'http://www.ivolatility.com/options/AMZN/NASDAQ/'
login_url = 'https://www.ivolatility.com/login.j'
usr = 'my username'
pwd = 'my password'
data = {
'username':usr, 'password':pwd,
'ref_url':login_url, 'service_name':'Home Page',
'step':1, 'login__is__sent':1
}
s = requests.session()
s.post(login_url, data)
r = s.get(url)
with open('my file', 'wb') as f:
f.write(r.content)

Related

Trying to write a script to upload files to a django project

I have a django 3.x project where I can upload multiple files and associated form data through the admin pages to a model called Document. However, I need to upload a large number of files, so I wrote a small python script to automate that process.
I am having one problem with the script. I can't seem to set the name of the file as it is set when uploaded through the admin page.
Here is the script...I had a few problems getting the csrf token working correctly, so there may be some redundant code for that.
import requests
# Set up the urls to login to the admin pages and access the correct add page
URL1='http://localhost:8000/admin/'
URL2='http://localhost:8000/admin/login/?next=/admin/'
URL3 = 'http://localhost:8090/admin/memorabilia/document/add/'
USER='admin'
PASSWORD='xxxxxxxxxxxxx'
client = requests.session()
# Retrieve the CSRF token first
client.get(URL1) # sets the cookie
csrftoken = client.cookies['csrftoken']
print("csrftoken1=%s" % csrftoken)
login_data = dict(username=USER, password=PASSWORD, csrfmiddlewaretoken=csrftoken)
r = client.post(URL2, data=login_data, headers={"Referer": "foo"})
r = client.get(URL3)
csrftoken = client.cookies['csrftoken']
print("csrftoken2=%s" % csrftoken)
cookies = dict(csrftoken= csrftoken)
headers = {'X-CSRFToken': csrftoken}
file_path = "/media/mark/ea00fd8e-4330-4d76-81d8-8fe7dde2cb95/2017/Memorable/20047/Still Images/Photos/20047_Phillips_Photo_052_002.jpg"
data = {
"csrfmiddlewaretoken": csrftoken,
"documentType_id": '1',
"rotation" : '0',
"TBD": '350',
"Title": "A test title",
"Period": "353",
"Source Folder": '258',
"Decade": "168",
"Location": "352",
"Photo Type": "354",
}
file_data = None
with open(file_path ,'rb') as fr:
file_data = fr.read()
# storage_file_name is the name of the FileField in the Document model.
#response_1 = requests.post(url=URL3, data=data, files={'storage_file_name': file_data,}, cookies=cookies)
response_2 = client.post(url=URL3, data=data, files={'storage_file_name': file_data, 'name': "20047_Phillips_Photo_052_002.jpg"}, cookies=cookies,)
When I upload using the admin page, the name of the file is "20047_Phillips_Photo_052_002.jpg", as it should be (i.e. storage_file_name.name = 20047_Phillips_Photo_052_002.jpg).
When I run the script using files={'storage_file_name': file_data,} (see response_1 at the bottom of the script), the files uploads correctly except that the name of the file is "storage_file_name" and not "20047_Phillips_Photo_052_002.jpg" (i.e. storage_file_name.name = "storage_file_name").
When I upload using files={'storage_file_name': file_data, 'name': "20047_Phillips_Photo_052_002.jpg"} the name of the file is still "storage_file_name" (i.e. storage_file_name.name = "storage_file_name").
I looked in the request.FILES object when uploading a file through the admin page, and the _name field for each object is the name of the file being uploaded. The documentation for the django File object says it has a field called name.
What am I missing to get my script to upload a file the same way as the admin page does? By that I mean, the name of the file is not "storage_file_name".

When I change the last response= line to
response = client.post(url=URL3, data=metadata, files= {'storage_file_name': open(file_path ,'rb'),}, cookies=cookies, headers=headers)
the file upload works and the file name is correctly displayed.

How to get Superset Token?? (for use Rest api)

I attempted to request a REST request to see the document below. But do not work. https://superset.apache.org/docs/rest-api
request: curl -XGET -L http://[IP:PORT]/api/v1/chart
response: {"msg":"Bad Authorization header. Expected value 'Bearer <JWT>'"}
The Superset installation has been on PIP and was also Helm Chart. But all are the same. helm: https://github.com/apache/superset
How should I order a REST API?

Check the security section of the documentation you have linked. It has this API /security/login, you can follow the JSON parameter format and get the JWT bearer token. Use that token to send in the Header of your other API calls to superset.

open http://localhost:8080/swagger/v1, assuming http://localhost:8080 is your Superset host address
then find this section
the response would be like this
{
"access_token": "eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJmcmVzaCI6dHJ1ZSwiaWF0IjoxNjU0MzQ2OTM5LCJqdGkiOiJlZGY2NTUxMC0xMzI1LTQ0NDEtYmFmMi02MDc1MzhjZDcwNGYiLCJ0eXBlIjoiYWNjZXNzIiwic3ViIjoxLCJuYmYiOjE2NTQzNDY5MzksImV4cCI6MTY1NDM0NzgzOX0.TfjUea3ycH77xhCWOpO4LFbYHrT28Y8dnWsc1xS_IOY",
"refresh_token": "eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJmcmVzaCI6ZmFsc2UsImlhdCI6MTY1NDM0NjkzOSwianRpIjoiNzBiM2EyZDYtNDFlNy00ZDNlLWE0NDQtMTRiNTkyNTk4NjUwIiwidHlwZSI6InJlZnJlc2giLCJzdWIiOjEsIm5iZiI6MTY1NDM0NjkzOSwiZXhwIjoxNjU2OTM4OTM5fQ.OgcctNnO4zTDfTgtHnaEshk7u-D6wOxfxjCsjqjKYyE"
}

Thank #andrewsali commented on this github issue, I finally figure out how to access the superset REST API by python code.
import requests
from bs4 import BeautifulSoup
import json
def get_supetset_session():
"""
# http://192.168.100.120:8088/swagger/v1
url = f'http://{superset_host}/api/v1/chart/'
r = s.get(url)
# print(r.json())
"""
superset_host = '192.168.100.120:8088' # replace with your own host
username = 'YOUR_NAME'
password = 'YOUR_PASSWORD'
# set up session for auth
s = requests.Session()
login_form = s.post(f"http://{superset_host}/login")
# get Cross-Site Request Forgery protection token
soup = BeautifulSoup(login_form.text, 'html.parser')
csrf_token = soup.find('input',{'id':'csrf_token'})['value']
data = {
'username': username,
'password': password,
'csrf_token':csrf_token
}
# login the given session
s.post(f'http://{superset_host}/login/', data=data)
print(dict(s.cookies))
return s
DEMO
# s = get_supetset_session()
base_url = 'http://192.168.100.120:8088'
def get_dashboards_list(s, base_url=base_url):
"""## GET List of Dashboards"""
url = base_url + '/api/v1/dashboard/'
r = s.get(url)
resp_dashboard = r.json()
for result in resp_dashboard['result']:
print(result['dashboard_title'], result['id'])
s = get_supetset_session()
# {'session': '.eJwlj8FqAzEMRP_F5z1Islay8jOLJcu0NDSwm5xK_r0uPQ7DG978lGOeeX2U2_N85VaOz1FuxVK6JIHu1QFhGuEOk5NG8qiYGkJ7rR3_Ym-uJMOzJqySeHhIG8SkNQK6GVhTdLf0ZMmG6sZGQtiQ1Gz0qYiUTVoHhohZthLXOY_n4yu_l0-VKTObLaE13i2Hz2A2rzBmhU7WkkN1cfdH9HsuZoFbeV15_l_C8v4F4nBC9A.Ypn16Q.yz4E-vz0gp3EmJwv-6tYIcOGavU'}
get_dashboards_list(s)

Thanks #Ferris for this visual solution!
To add to this, you can also create the appropriate API call with Python just like following:
import requests
api_url = "your_url/api/v1/security/login"
payload = {"password":"your password",
"provider":"db",
"refresh":True,
"username":"your username"
}
response = requests.post(api_url, json=payload)
# the acc_token is a json, which holds access_token and refresh_token
access_token = response.json()['access_token']
# no get a guest token
api_url_for_guesttoken = "your_url/api/v1/security/guest_token"
payload = {}
# now this is the crucial part: add the specific auth-header
response = request.post(api_url_for_guesttoken , json=payload, headers={'Authorization':f"Bearer {access_token}"})

Spotipy on Django authorization without copy-paste to console

I have a Django site in which I want to use spotipy to look for statistics of the song like popularity and views. I have this code right now:
import spotipy
import spotipy.util as util #luxury
import json
import webbrowser
username = 'dgrqnco2rx8hdu58kv9if9eho'
scope = 'user-read-private user-read-playback-state user-modify-playback-state'
token = util.prompt_for_user_token(username, scope, client_id='08bb526962574a46b359bffc56048147',
client_secret='bf6d4184c8ae40aca207714e02153bad', redirect_uri='http://google.com/')
sp_obj = spotipy.Spotify(auth=token)
ss = 'name of song'
if ss.__contains__('('):
q = ss[0:ss.index('(')]
elif ss.__contains__('['):
q = ss[0:ss.index('[')]
elif ss.__contains__('['):
q = ss[0:ss.index('{')]
else:
q = ss
query = sp_obj.search(q, 1, 0, 'track')
#<<<<<<<<<<SONG>>>>>>>>>>
#FIND THE SONG URI
song_uri = query['tracks']['items'][0]['uri']
track = sp_obj.track(song_uri)
track_data = sp_obj.audio_features(song_uri)
song_popularity = track['popularity']
song_danceability = track_data[0]['danceability']
song_energy = track_data[0]['energy']
song_loudness = track_data[0]['loudness']
song_tempo = track_data[0]['tempo']
However spotipy redirects me to a page for authorization and I need to paste the url in the console. The regular user however does not have access to this console. So how can I do the authorization in an alternative way or even bypass it?
I was thinking about getting a spotify account in which every user will be getting logged in so that the user won't have to do the authorization and won't have to have a spotify account. Is this possible? If not what else can I try?

You can't use util.prompt_for_user_token because it's just a helper for local usage only.
You need to arrange your code as API endpoints so that multiple users can sign in. Here is a full working example that would allow multiple users to sign in https://github.com/plamere/spotipy/blob/master/examples/app.py.
It uses Flask but you can easily adapt it to Django.

logging into github with python

I have been up and down these pages looking for how to do this and there are many similar posts but I can't seem to get it to work, so I find myself having to ask specifically how to do this.
I am trying to gather metrics about my software project in git hub. For many of these metrics you can use the API. However, one of the most interesting items are the unique visitors and view count on the github graphs/traffic and unfortunately this info is not located in the Github API. So, to get this I am trying to log into my github account navigate to the site then get the numbers. Located below is my code. I can't seem to get logged into github to do anything however (my url request continues to show a login page rather then the traffic page). I think it probably has something to do with the variables that need to be posted but I'm not sure whats wrong with them.
from requests import session
from bs4 import BeautifulSoup as bs
USER = 'MYID'
PASSWORD = 'MYPASSWORD'
URL1 = 'https://github.com/login'
URL2 = 'https://github.com/MYPROJ/graphs/traffic'
with session() as s:
req = s.get(URL1).text
html = bs(req)
token = html.find("input", {"name": "authenticity_token"}).attrs['value']
com_val = html.find("input", {"name": "commit"}).attrs['value']
login_data = {'login_field': USER,
'password': PASSWORD,
'authenticity_token' : token,
'commit' : com_val}
r1 = s.post(URL1, data = login_data)
r2 = s.get(URL2)
print(r2.url)
print bs(r2.text).find('span', {'class':'num js-uniques uniques'})
Any help is appreciated.
Thanks,
-Jeff

Figured it out.
I was using the wrong address to post my login and username, as well as some other wrong bits.
This is the updated code that worked for me:
from requests import session
from bs4 import BeautifulSoup as bs
USER = 'MyUserName'
PASSWORD = 'Mypassword'
URL1 = 'https://github.com/session'
URL2 = 'https://github.com/MyProj/graphs/traffic-data'
with session() as s:
req = s.get(URL1).text
html = bs(req)
token = html.find("input", {"name": "authenticity_token"}).attrs['value']
com_val = html.find("input", {"name": "commit"}).attrs['value']
login_data = {'login': USER,
'password': PASSWORD,
'commit' : com_val,
'authenticity_token' : token}
r1 = s.post(URL1, data = login_data)
r2 = s.get(URL2)
Cut1 = r2.text.split(',"summary":{"total":',2)
ViewsTot = Cut1[1].split(',"unique":',1)
ViewsUnq = ViewsTot[1].split('}}',1)

How can I get the lenth of a session in django views

I am using a code for my wish list . I need the no of products in the wishlist to show there on my site .I tried various methods but I Think session will only do this .Can some help please .
How can I do so .
#never_cache
def wishlist(request, template="shop/wishlist.html"):
"""
Display the wishlist and handle removing items from the wishlist and
adding them to the cart.
"""
skus = request.wishlist
error = None
if request.method == "POST":
to_cart = request.POST.get("add_cart")
add_product_form = AddProductForm(request.POST or None,
to_cart=to_cart,request=request)
if to_cart:
if add_product_form.is_valid():
request.cart.add_item(add_product_form.variation, 1,request)
recalculate_discount(request)
message = _("Item added to cart")
url = "shop_cart"
else:
error = add_product_form.errors.values()[0]
else:
message = _("Item removed from wishlist")
url = "shop_wishlist"
sku = request.POST.get("sku")
if sku in skus:
skus.remove(sku)
if not error:
info(request, message)
response = redirect(url)
set_cookie(response, "wishlist", ",".join(skus))
return response
# Remove skus from the cookie that no longer exist.
published_products = Product.objects.published(for_user=request.user)
f = {"product__in": published_products, "sku__in": skus}
wishlist = ProductVariation.objects.filter(**f).select_related(depth=1)
wishlist = sorted(wishlist, key=lambda v: skus.index(v.sku))
context = {"wishlist_items": wishlist, "error": error}
response = render(request, template, context)
if len(wishlist) < len(skus):
skus = [variation.sku for variation in wishlist]
set_cookie(response, "wishlist", ",".join(skus))
return response

Session != Cookies. The session is managed by the server on the backend, cookies are sent to the users browser. Django uses a single cookie to help track sessions but you are simply using cookies in this instance.
The session framework lets you store and retrieve arbitrary data on a per-site-visitor basis. It stores data on the server side and abstracts the sending and receiving of cookies. Cookies contain a session ID – not the data itself (unless you’re using the cookie based backend).
It's difficult to tell what you want, but if you simply want to get a count of the number of items you are saving in the cookie, you simply have to count your skus and put it in the context being sent to the template:
if len(wishlist) < len(skus):
skus = [variation.sku for variation in wishlist]
set_cookie(response, "wishlist", ",".join(skus))
context = {"wishlist_items": wishlist, "error": error, "wishlist_length":len(wishlist)}
return render(request, template, context)
and use:
{{ wishlist_length }}
in your template

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Cannot download html (entire web page) - python-2.7

Related

Trying to write a script to upload files to a django project

How to get Superset Token?? (for use Rest api)

Spotipy on Django authorization without copy-paste to console

logging into github with python

How can I get the lenth of a session in django views

Categories

Resources