Parse HTTP_USER_AGENT string in django. - django

What is a best way to parse User Agent string from the django request?
request.META.get('HTTP_USER_AGENT', '')
here is what i get in the string.
Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.146 Safari/537.36
Not sure how to parse this information.
Is there any cost(time, + memory) efficient solution.
I just need to parse the string thats' all.

you can try this library :
https://pypi.python.org/pypi/user-agents/
Example :
from user_agents import parse
# iPhone's user agent string
ua_string = 'Mozilla/5.0 (iPhone; CPU iPhone OS 5_1 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Version/5.1 Mobile/9B179 Safari/7534.48.3'
user_agent = parse(ua_string)
# Accessing user agent's browser attributes
user_agent.browser # returns Browser(family=u'Mobile Safari', version=(5, 1), version_string='5.1')
user_agent.browser.family # returns 'Mobile Safari'
user_agent.browser.version # returns (5, 1)
user_agent.browser.version_string # returns '5.1'
# Accessing user agent's operating system properties
user_agent.os # returns OperatingSystem(family=u'iOS', version=(5, 1), version_string='5.1')
user_agent.os.family # returns 'iOS'
user_agent.os.version # returns (5, 1)
user_agent.os.version_string # returns '5.1'
# Accessing user agent's device properties
user_agent.device # returns Device(family='iPhone')
user_agent.device.family # returns 'iPhone'

Related

HTTP response codes coming wrongly where it is actually 200

I am trying to extract the HTTP links from an XML. Then trying to get the http response code for the same. But interestingly, i am getting either 500 or 400. If i click on the url, i will get the image properly in the browser.
My Code is:
def extract_src_link(path):
with open(path, 'r') as myfile:
for line in myfile:
if "src" in line:
src_link = re.search('src=(.+?)ptype="2"', line)
url = src_link.group(1)
url = url[1:-1]
#print ("url:", url)
resp = requests.head(url)
print(resp.status_code)
Not sure whats happening here. This is how my output looks like
/usr/local/bin/python2.7
/Users/rradhakrishnan/Projects/eVision/Scripts/xml_validator_ver3.py
Processing:
/Users/rradhakrishnan/rradhakrishnan1/mobily/E30000001554522119_2020_01_27T17_35_40Z.xml
500
404
Processing:
/Users/rradhakrishnan/rradhakrishnan1/mobily/E30000001557496079_2020_01_27T17_35_40Z.xml
500
404
This is how my output looks like:
I somehow managed to crack it down. Adding the User Agent did resolve the issue.
def extract_src_link(path):
with open(path, 'r') as myfile:
for line in myfile:
if "src" in line:
src_link = re.search('src=(.+?)ptype="2"', line)
url = src_link.group(1)
url = url[1:-1]
print ("url:", url)
# resp = requests.head(url)
# print(resp.status_code)
headers ={'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2049.0 Safari/537.36'}
r = requests.get('http://www.booking.com/reviewlist.html?cc1=tr;pagename=sapphire', headers=headers)
print r.status_code

selenium with chromedriver on centOS7 for spidering

I trying to make Crawler for my server.
I Found chilkat Lib's CKSpider, but it is not support JS Rendering.
So I try to use selenium webdriver with Chrome.
I run with CentOS7, python2.7
I want spider all page with 1 baseDomain.
Example
BaseDomain = example.com
then find all page something like
example.com/event/.../../...
example.com/games/.../...
example.com/../.../..
...
My Crawler code
from selenium import webdriver
import time
options = webdriver.ChromeOptions()
options.binary_location = "/usr/bin/google-chrome"
chrome_driver_binary = "/root/chromedriver"
options.add_argument("--headless")
options.add_argument("user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36")
options.add_argument("lang=ko-KR,ko,en-US,en")
options.add_argument("--window-size=1920x1080")
options.add_argument("--disable-gpu")
options.add_argument("--no-sandbox")
options.add_argument("--disable-extensions")
driver = webdriver.Chrome(chrome_driver_binary, chrome_options=options)
host = example.com
def Crawler(Url):
driver.get(Url)
driver.implicitly_wait(3)
#Do Something
time.sleep(3)
#Crawl next
Crawler(host)
driver.quit()
How can I crawl next page? Is there any other way in selenium
Or need other Lib for that?
Thanks for any Tips or Advice.

Mechanize - Python

I am using mechanize in python to log into a HTTPS page. The login is successful but the output is just a SAML response. I am unable to get the actual page source which i get when opening with my browser.
import mechanize
import getpass
import cookielib
br=mechanize.Browser()
br.set_handle_robots(False)
b=[]
cj = cookielib.CookieJar()
br.set_cookiejar(cj)
pw=getpass.getpass("Enter Your Password Here: ")
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)
br.addheaders = [('User-agent','Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.56 Safari/535.11'),
('Accept', 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'),
('Accept-Encoding', 'gzip,deflate,sdch'),
('Accept-Language', 'en-US,en;q=0.8'),
('Accept-Charset', 'ISO-8859-1,utf-8;q=0.7,*;q=0.3')]
br.open("https:***single sign on login url***")
br.select_form(name='login-form')
br.form['userid']='id'
br.form['password']=pw
response=br.submit()
print response.read()
a=br.open("https:****url****")
for i in range(1000):
b.append(a.readline())
print b
I get SAML output which is encrypted but i dont know how to reply with that SAML post to get to the actual page.

Moving my Django project on DreamHost (error importing passenger_wsgi.py)

I am trying to move my Django project to DreamHost.
I have created a virtualenv copied my project into the virtualenv on DreamHost server and installed all the dependency.
I created a passenger_wsgi.py file
import sys, os
cwd = os.getcwd()
sys.path.append(cwd)
sys.path.append('/home/user/site/app_name')
sys.path.append('/home/user/site/project_name')
sys.path.append('/home/user/python/bin/')
sys.path.append('/home/user/python/lib/python2.7/site-packages')
sys.path.append('/home/user/python/lib/python2.7/site-packages/django')
#Swithch to new Python
INTERP = os.path.join(os.environ['HOME'], 'python', 'bin', 'python')
if sys.executable != INTERP:os.execl(INTERP, INTERP, *sys.argv)
os.environ['DJANGO_SETTING_MODULE'] = 'project.settings'
import django.core.handlers.wsgi
application = django.core.handlers.wsgi.WSGIHandler()
When I try to visit my site through the browser I get a error:
Getting an internal server error (500)
error.logs
Premature end of script headers: admin
Premature end of script headers: internal_error.html
access.logs
"GET / HTTP/1.1" 500 687 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.116 Safari/537.36"

django-debug-toolbar logging doesn't work?

I can't figure out how to use this plugin...
def homepage(request):
print request.META['HTTP_USER_AGENT']
print 'test'
return render(request, 'base.html')
after this , in logging tab some output must appear. In console i've got:
Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_6; en-US) AppleWebKit/534.13 (KHTML, like Gecko) Chrome/9.0.597.107 Safari/534.13
test
in django-debug-toolbar logging tab i have "No messages logged."
What do I do wrong?
You need to use the logging module for this to work.
import logging
logger = logging.getLogger(__name__)
logger.debug('Test')
django-toolbar intercepts this call and will add it to the toolbar. When you do print('test') it just goes to standard out.