How to send cookies separately in Python with urllib2 - cookies

I'm trying to send multiple cookies to a url until I get the right one and I don't know why my current code isn't working
I've looked at the existing answers for sending cookies to a url but none of them seem to work in my case
The comments in the code are instructions for the task
# Write a script that can guess cookie values
# and send them to the url http://127.0.0.1:8082/cookiestore
# Read the response from the right cookie value to get the flag.
# The cookie id the aliens are using is alien_id
# the id is a number between 1 and 75
import urllib2
req = urllib2.Request('http://127.0.0.1:8082/cookiestore')
for i in range(75):
req.add_header('alien_id', i)
response = urllib2.urlopen(req)
html = response.read()
print(html)
I expected that one of the iterations would print something different, but they are all the same

This code is Python 3.8 as Python 2.7 is no longer supported, and your req.add_header line did need modifying. Here is your solution:
import urllib.request
url = "http://127.0.0.1:8082/cookiestore"
y = 5
for i in range(75):
request = urllib.request.Request(url)
request.add_header("Cookie", "alien_id = "+str(i))
response = urllib.request.urlopen(request)
responseStr = str(response.read().decode("utf-8"))
print(responseStr)
If you are still using Python 2.7, you can just copy the request.add_header line.
Cheers!

Ahaha cyber discovery i see ;)
You were very close, infact i used your code as my base except for the add header line.
Not sure if you even still need to know but ill leave this here for others
When you send a cookie as a header you send it like so:
req.add_header('Cookie', 'cookiename=cookievalue')
Hope this helps :D

Related

Flask - Generated PDF can be viewed but cannot be downloaded

I recently started learning flask and created a simple webapp which randomly generates kids' math work sheets in PDF based on user input.
The PDF opens automatically in a browser and can be viewed. But when I try downloading it both on a PC and in Chrome iOS, I get error messages (Chrome PC: Failed - Network error / Chrome iOS:the file could not be downloaded at this time).
You can try it out here: kidsmathsheets.com
I suspect it has something to do with the way I'm generating and returning the PDF file. FYI I'm using ReportLab to generate the PDF. My code below (hosted on pythonanywhere):
from reportlab.lib.pagesizes import A4, letter
from reportlab.pdfgen import canvas
from reportlab.platypus import Table
from flask import Flask, render_template, request, Response
import io
from werkzeug import FileWrapper
# Other code to take in input and generate data
filename=io.BytesIO()
if letter_size:
c = canvas.Canvas(filename, pagesize=letter)
else:
c = canvas.Canvas(filename, pagesize=A4)
pdf_all(c, p_set, answer=answers, letter=letter_size)
c.save()
filename.seek(0)
wrapped_file = FileWrapper(filename)
return Response(wrapped_file, mimetype="application/pdf", direct_passthrough=True)
else:
return render_template('index.html')
Any idea what's causing the issue? Help is much appreciated!
Please check whether you are using an ajax POST request for invoking the endpoint to generate your data and display the PDF respectively. If this is the case - quite probably this causes the behaviour our observe. You might want to try invoking the endpoint with a GET request to /my-endpoint/some-hashed-non-reusable-id-of-my-document where some-hashed-non-reusable-id-of-my-documentwill tell the endpoint which document to serve without allowing users to play around with guesstimates about what other documents you might have. You might try it first like:
#app.route('/display-document/<document_id>'):
def display_document(document_id):
document = get_my_document_from_wherever_it_is(document_id)
binary = get_binary_data_from_document(document)
.........
Prepare response here
.......
return send_file(binary, mimetype="application/pdf")
Kind note: a right click and 'print to pdf' will work but this is not the solution we want

Scraping Aliexpress site with Python don't Give me Correct Result

I have problem in scraping aliexpress site.
https://www.aliexpress.com/item/Free-gift-100-Factory-Original-Unlocked-Apple-iphone-4G-8GB-16GB-32GB-Cell-phone-3-5/32691056589.html
This is one url.
What i want to get.
r = requests.get('https://www.aliexpress.com/item/Free-gift-100-Factory-Original-Unlocked-Apple-iphone-4G-8GB-16GB-32GB-Cell-phone-3-5/32691056589.html')
beautifulsoup
content = soup.find('div', {'id':'j-product-tabbed-pane'})
lxml parsing.
root = html.fromstring(r.content)
results = root.xpath('//img[#alt="aeProduct.getSubject()"]')
f = open('result.html', 'w')
f.write(lxml.html.tostring(results[0]))
f.close()
This my my code but give me false result.
Inspect on browser has that elements
But above code don't give me anything.
I think requests.get don't give me correct contents. But why and how i can solve this problem. They detect as a bot?. How can help me.
Thank you every one.
try this
1-use user agent
2-use proxy
3-disable javascript from this site and refresh it then see if the site have this element or it loads by javascript if it loads by javascript
you should find a way to render JS

Read multilanguage strings from html via Python 2.7

I am new in python 2.7 and I am trying to extract some info from html files. More specifically, I wand to read some text information that contains multilanguage information. I give my script hopping to make things more clear.
import urllib2
import BeautifulSoup
url = 'http://www.bbc.co.uk/zhongwen/simp/'
page = urllib2.urlopen(url).read().decode("utf-8")
dom = BeautifulSoup.BeautifulSoup(page)
data = dom.findAll('meta', {'name' : 'keywords'})
print data[0]['content'].encode("utf-8")
the result I am taking is
BBCϊ╕φόΨΘύ╜ΣΎ╝Νϊ╕╗ώκ╡Ύ╝Νbbcchinese.com, email news, newsletter, subscription, full text
The problem is in the first string. Is there any way to print what exactly I am reading? Also is there any way to find the exact encoding of the language of each script?
PS: I would like to mention that the site selected totally randomly as it is representative to the problem I am encountering.
Thank you in advance!
You have problem with the terminal where you are outputting the result. The script works fine and if you output data to file you will get it correctly.
Example:
import urllib2
from bs4 import BeautifulSoup
url = 'http://www.bbc.co.uk/zhongwen/simp/'
page = urllib2.urlopen(url).read().decode("utf-8")
dom = BeautifulSoup(page)
data = dom.findAll('meta', {'name' : 'keywords'})
with open("test.txt", "w") as myfile:
myfile.write(data[0]['content'].encode("utf-8"))
test.txt:
BBC中文网,主页,bbcchinese.com, email news, newsletter, subscription, full text
Which OS and terminal you are using?

problems for parse complicate json post data in django

I have a django parse function as:
def parse_org(request):
try:
org = simplejson.loads(request.POST['org'])
except Exception:
traceback.print_exc()
print org
I got the decode error.
On the client side, the script version(code are pasted in later part) works fine, but recently I want to write a python version to do load test, so I write the following code in a python client script to send request:
data_dict = {}
org = ["UCSD", "MIT"]
data_dict["org"] = org
req = urllib2.Request(request_url, urllib.urlencode(data_dict), headers)
response = urllib2.urlopen(req, timeout = 5)
Then the parse code at the django site gets parsing error. I compare the correct javascript version and the wrong python clients, the only difference is the single and double quote,
the wrong parsed input at django side is:
POST:<QueryDict: {u'org': [u"['UCSD', 'MIT']"], ....
the correct input is:
POST:<QueryDict: {u'org': [u'["UCSD","MIT"]'], ....
For your reference, the javascript side looks like(the django can correctly parse the org as array):
var org = [];
org.push("UCSD")
org.push("MIT")
var data = {"org": JSON.stringify(org), ...
}
$.post(url, data, function(data){
callback(data);
});
I searched a lot, but still can't find why the python client can't work but the javascript client can work. Is it related with the urllib's urlencode? and why there is single and double quote difference there?
Thanks a lot!

Problems Scraping a Page With Beautiful Soup

I am using Beautiful Soup to try and scrape a page.
I am trying to follow this tutorial.
I am trying to get the contents of the following page after submitting a Stock Ticker Symbol:
http://www.cboe.com/delayedquote/quotetable.aspx
The tutorial is for a page with a "GET" method, my page is a "POST". I wonder if that is part of the problem?
I want use the first text box – under where it says:
“Enter a Stock or Index symbol below for delayed quotes.”
Relevant code:
user_agent = 'Mozilla/5 (Solaris 10) Gecko'
headers = { 'User-Agent' : user_agent }
values = {'ctl00$ctl00$AllContent$ContentMain$ucQuoteTableCtl$txtSymbol' : 'IBM' }
data = urllib.urlencode(values)
request = urllib2.Request("http://www.cboe.com/delayedquote/quotetable.aspx", data, headers)
response = urllib2.urlopen(request)
The call does not fail, I do not get a set of options and prices returned to me like when I run the page interactively. I a bunch of garbled HTML.
Thanks in advance!
Ok - I think I figured out the problem (and found another). I decided to switch to 'mechanize' from 'urllib2'. Unfortunately, I kept having problems getting the data. Finally, I realized that there are two 'submit' buttons, so I tried passing the name parameter when submitting the form. That did the trick as far as getting the correct response.
However, the next problem was that I could not get BeautifulSoup to parse the HTML and find the necessary tags. A brief Google search revealed others having similar problems. So, I gave up on BeautifulSoup and just did a basic regex on the HTML. Not as elegant as BeautifulSoup, but effective.
Ok - enough speechifying. Here's what I came up with:
import mechanize
import re
br = mechanize.Browser()
url = 'http://www.cboe.com/delayedquote/quotetable.aspx'
br.open(url)
br.select_form(name='aspnetForm')
br['ctl00$ctl00$AllContent$ContentMain$ucQuoteTableCtl$txtSymbol'] = 'IBM'
# here's the key step that was causing the trouble - pass the name parameter
# for the button when calling submit
response = br.submit(name="ctl00$ctl00$AllContent$ContentMain$ucQuoteTableCtl$btnSubmit")
data = response.read()
match = re.search( r'Bid</font><span> \s*([0-9]{1,4}\.[0-9]{2})', data, re.MULTILINE|re.M|re.I)
if match:
print match.group(1)
else:
print "There was a problem retrieving the quote"