Unable to parse Non ASCII URL on Python? - python-2.7

I wanted to query Freebase API to get the list of teams José Mourinho has played for.
So, the URL i used on my browser is
https://www.googleapis.com/freebase/v1/mqlread?query=[{"name": "José Mourinho","/sports/pro_athlete/teams": [{"mid": null,"team": null,"to": null,"optional": true}]}]
However,
import json
import urllib
service_url="https://www.googleapis.com/freebase/v1/mqlread"
query = '[{"name": "' + "José Mourinho" + '","/sports/pro_athlete/teams": [{"mid": null,"team": null,"to": null,"optional": true}]}]'
url = service_url + '?' + 'query='+query
response = json.loads(urllib.urlopen(url).read())
Gives me an error saying,
UnicodeError: URL u'https://www.googleapis.com/freebase/v1/mqlread?query=[{"name": "Jos\xe9 Mourinho","/sports/pro_athlete/teams": [{"mid": null,"team": null,"to": null,"optional": true}]}]' contains non-ASCII characters
What is the solution to this?

I think you skipped over a little bit of the docs. Try this instead:
# coding=UTF-8
import json
import urllib
service_url = "https://www.googleapis.com/freebase/v1/mqlread"
query = [{
'/sports/pro_athlete/teams': [
{
'to': None,
'optional': True,
'mid': None,
'team': None
}
],
'name': 'José Mourinho'
}]
url = service_url + '?' + urllib.urlencode({'query': json.dumps(query)})
response = json.loads(urllib.urlopen(url).read())
print response
Rather than building the query string yourself, use json.dumps and urllib.urlencode to create it for you. They're good at this.
Note: if you can use the requests package, that last bit could be:
import requests
response = requests.get(service_url, params={'query': json.dumps(query)})
Then you get to skip the URL construction and escaping altogether!

Related

api request parameters are ignored

This code works as expected and shows 3 recent wikipedia editors.
My question is that if I uncomment the second URL line, I should get Urmi27 three times or None if the user is not listed.
But I get the same list for both URL. Is "action" ignored by api request?
import requests
S = requests.Session()
URL = "https://en.wikipedia.org/w/api.php"
#URL = "https://en.wikipedia.org/w/api.php?action=feedcontributions&user=Urmi27"
PARAMS = {
"format": "json",
"rcprop": "comment|loginfo|title|ids|sizes|flags|user",
"list": "recentchanges",
"action": "query",
"rclimit": "3"
}
R = S.get(url=URL, params=PARAMS)
DATA = R.json()
RECENTCHANGES = DATA['query']['recentchanges']
for rc in RECENTCHANGES:
print (rc['user'])
You are defining GET parameter in 2 places (in the URL and in the PARAMS dictionary) and the API is prioritizing the PARAMS
The query and feedcontributions actions are very different, using different parameters and different return formats.
To use the feedcontributions action, you would need something like this:
import requests
import xml.etree.ElementTree as ET
S = requests.Session()
URL = "https://en.wikipedia.org/w/api.php"
PARAMS = {
"action":"feedcontributions",
"user":"Urmi27"
}
R = S.get(url=URL, params=PARAMS)
xml_tree = ET.fromstring(R.content)
for child in xml_tree:
print(child.tag, child.attrib)
for channel in child:
for elements in channel:
if elements.tag == "description":
print(elements.text)
API REF

Django setup elasticsearch client with password auth

My Django application uses elasticsearch to index several ressources.
Now I wanted to protect my elasticsearch instance with as password which is working fine if I use "curl -u" or so. Anyways from the elasticsearch_dsl documentation, found here: https://elasticsearch-dsl.readthedocs.io/en/latest/configuration.html, I do not understand what I have to do in order to setup elasticsearch that way that it uses a password for authentication and where do I have to place this code?! is smb. maybe able to pass show me some snippets of his configuration?
My current state looks like this:
settingy.py
ELASTICSEARCH_DSL = {
'default': {
'hosts': env.str('ELASTICSEARCH_HOST') + str(':') + env.str('ELASTICSEARCH_PORT'),
},
}
ELASTICSEARCH_DSL_SIGNAL_PROCESSOR = 'django_elasticsearch_dsl.signals.RealTimeSignalProcessor'
documents.py
from django_elasticsearch_dsl import Document, Index, fields
from elasticsearch_dsl import analyzer
from App.models import Post
# elasticsearch index
posts = Index('posts')
html_strip = analyzer(
'html_strip',
tokenizer="standard",
filter=["lowercase", "stop", "snowball"],
char_filter=["html_strip"]
)
#posts.document
class PostDocument(Document):
... more index stuff
According to the docs, I have to manually setup a default client connection where I can also pass the password and username for authentication which to me seems not to be possible at settings.py at moment.
Kind regards
You can pass the elasticsearch URL as
from urllib.parse import quote_plus as urlquote
elk_base_url = 'elasticsearch://{user_name}:{password}#{host_ip}:{host_port}'
elastic_search_url = elk_base_url.format(user_name='my_username',
password=urlquote('mysecret_password'),
# password may contain special characters
host_ip='my-elastic-host-ip',
host_port=9200)
ELASTICSEARCH_DSL = {
'default': {
'hosts': [elastic_search_url]
},
}
This solution has been tested under the following circumstances
Django==3.0.4
django-elasticsearch-dsl==7.1.1
logstash == kibana == elasticsearch == 7.6.0
If you are experiencing AuthenticationException(401, '') exception, it means you were provide the wrong credentials. Please do check the value of elastic_search_url and make sure the values are correct.

How to successfully load bulk contacts into constant contact API via python?

I need to bulk load contacts to particular list via Constant Contact API (http://developer.constantcontact.com/docs/contacts-api/contacts-collection.html?method=POST).
I can successfully add contacts using the same JSON string as below into the following API GUI website(https://constantcontact.mashery.com/io-docs (find Tab POST 'add contact' to collection):
update_contact = {"lists": [{"id": "1"}],"email_addresses": [{"email_address": "yasmin1.abob19955#gmail.com"}],"first_name": "Ronald","last_name": "Martone"}
However when I run the same JSON string in my python code I get error 400 with the error message from my response object as the following:
[{"error_key":"query.param.invalid","error_message":"The query parameter first_name is not supported."},
{"error_key":"query.param.invalid","error_message":"The query parameter last_name is not supported."},{"error_key":"query.param.invalid","error_message":"The query parameter lists is not supported."},{"error_key":"query.param.invalid","error_message":"The query parameter email_addresses is not supported."}]
How can two of the same API calls produce different results?and how do I get my python code to work?
code:
import requests
headers = {
'Authorization': 'Bearer X',
'X-Originating-Ip': '1',
'Content-Type': 'application/json'
}
update_contact = {"lists": [{"id": "1"}],"email_addresses": [{"email_address": "yasmin1.abob19955#gmail.com"}],"first_name": "Ronald","last_name": "Martone"}
r_2 = requests.post('https://api.constantcontact.com/v2/contacts?action_by=ACTION_BY_OWNER&api_key=x', headers=headers ,params = update_contact)
print(r_2.text)
You will need to change params to data
r_2 = requests.post('https://api.constantcontact.com/v2/contacts?action_by=ACTION_BY_OWNER&api_key=x', headers=headers ,data = update_contact)
Additionally, you can use a multipart endpoint to upload contacts as well. I have found this to be very easy especially if your contacts are in a csv file.
A sample code would look like this:
import requests as r
import csv
from datetime import datetime
url = 'https://api.constantcontact.com/v2/activities/addcontacts?api_key=<your_api_key>'
headers = {'Authorization': 'Bearer <access_token>',
'X-Originating-Ip': '<ip>',
'content-type': 'multipart/form-data'}
files = {'file_name': 'Book1.csv',
'data': ('Book1.csv', open('Book1.csv', 'rb'),
'application/vnd.ms-excel', {'Expires': '0'}),
'lists':('<insert_listIds_seperated_by_commas>')}
response = r.post(url, headers=headers, files=files)
with open('CC_Upload_Response_Data.csv', 'a', newline='') as f:
writer = csv.writer(f)
time_stamp = datetime.now().strftime('%m-%d-%Y %H:%M')
row = [response, response.text, time_stamp]
writer.writerow(row)
The headers of your csv file need to be like so: "First Name" "Last Name" "Email Address" "Custom Field 1" "Custom Field 2" and so on. You can find a complete list of column names here: http://developer.constantcontact.com/docs/bulk_activities_api/bulk-activities-import-contacts.html
The csv file that this code appends acts like a log if you are going to schedule you .py file to run nightly. The log records the response code, response text and adds a timestamp.
Mess around with it a little and get it the way you like.

How to Short an URL using Google API and REQUESTS?

I am trying to short an URL using Google API but using only the requests module.
The code looks like this:
import requests
Key = "" # found in https://developers.google.com/url-shortener/v1/getting_started#APIKey
api = "https://www.googleapis.com/urlshortener/v1/url"
target = "http://www.google.com/"
def goo_shorten_url(url=target):
payload = {'longUrl': url, "key":Key}
r = requests.post(api, params=payload)
print(r.text)
When I run goo_shorten_url it returns:
"error": {
"errors": [
{
"domain": "global",
"reason": "required",
"message": "Required",
"locationType": "parameter",
"location": "resource.longUrl"
}
],
"code": 400,
"message": "Required"
}
But the longUrl parameter is there!
What am I doing wrong?
At first, please confirm that "urlshortener api v1" is enabled at Google API Console.
Content-Type is required as a header. And please use data as a request parameter. The modified sample is as follows.
Modified sample :
import json
import requests
Key = "" # found in https://developers.google.com/url-shortener/v1/getting_started#APIKey
api = "https://www.googleapis.com/urlshortener/v1/url"
target = "http://www.google.com/"
def goo_shorten_url(url=target):
headers = {"Content-Type": "application/json"}
payload = {'longUrl': url, "key":Key}
r = requests.post(api, headers=headers, data=json.dumps(payload))
print(r.text)
If above script doesn't work, please use an access token. The scope is https://www.googleapis.com/auth/urlshortener. In the case of use of access token, the sample script is as follows.
Sample script :
import json
import requests
headers = {
"Authorization": "Bearer " + "access token",
"Content-Type": "application/json"
}
payload = {"longUrl": "http://www.google.com/"}
r = requests.post(
"https://www.googleapis.com/urlshortener/v1/url",
headers=headers,
data=json.dumps(payload)
)
print(r.text)
Result :
{
"kind": "urlshortener#url",
"id": "https://goo.gl/#####",
"longUrl": "http://www.google.com/"
}
Added 1 :
In the case of use tinyurl.com
import requests
URL = "http://www.google.com/"
r = requests.get("http://tinyurl.com/" + "api-create.php?url=" + URL)
print(r.text)
Added 2 :
How to use Python Quickstart
You can use Python Quickstart. If you don't have "google-api-python-client", please install it. After installed it, please copy paste a sample script from "Step 3: Set up the sample", and create it as a python script. Modification points are following 2 parts.
1. Scope
Before :
SCOPES = 'https://www.googleapis.com/auth/drive.metadata.readonly'
After :
SCOPES = 'https://www.googleapis.com/auth/urlshortener'
2. Script
Before :
def main():
"""Shows basic usage of the Google Drive API.
Creates a Google Drive API service object and outputs the names and IDs
for up to 10 files.
"""
credentials = get_credentials()
http = credentials.authorize(httplib2.Http())
service = discovery.build('drive', 'v3', http=http)
results = service.files().list(
pageSize=10,fields="nextPageToken, files(id, name)").execute()
items = results.get('files', [])
if not items:
print('No files found.')
else:
print('Files:')
for item in items:
print('{0} ({1})'.format(item['name'], item['id']))
After :
def main():
credentials = get_credentials()
http = credentials.authorize(httplib2.Http())
service = discovery.build('urlshortener', 'v1', http=http)
resp = service.url().insert(body={'longUrl': 'http://www.google.com/'}).execute()
print(resp)
After done the above modifications, please execute the sample script. You can get the short URL.
I am convinced that one CANNOT use ONLY requests to use google api for shorten an url.
Below I wrote the solution I ended up with,
It works, but it uses google api, which is ok, but I cannot find much documentation or examples about it (Not as much as I wanted).
To run the code remember to install google api for python first with
pip install google-api-python-client, then:
import json
from oauth2client.service_account import ServiceAccountCredentials
from apiclient.discovery import build
scopes = ['https://www.googleapis.com/auth/urlshortener']
path_to_json = "PATH_TO_JSON"
#Get the JSON file from Google Api [Website]
(https://console.developers.google.com/apis/credentials), then:
# 1. Click on Create Credentials.
# 2. Select "SERVICE ACCOUNT KEY".
# 3. Create or select a Service Account and
# 4. save the JSON file.
credentials = ServiceAccountCredentials.from_json_keyfile_name(path_to_json, scopes)
short = build("urlshortener", "v1",credentials=credentials)
request = short.url().insert(body={"longUrl":"www.google.com"})
print(request.execute())
I adapted this from Google's Manual Page.
The reason it has to be so complicated (more than I expected at first at least) is to avoid the OAuth2 authentication that requires the user (Me in this case) to press a button (to confirm that I can use my information).
As the question is not very clear this answer is divided in 4 parts.
Shortening URL Using:
1. API Key.
2. Access Token
3. Service Account
4. Simpler solution with TinyUrl.
API Key
At first, please confirm that "urlshortener api v1" is enabled at Google API Console.
Content-Type is required as a header. And please use data as a request parameter. The modified sample is as follows.
(Seems not to be working despite what the API manual says).
Modified sample :
import json
import requests
Key = "" # found in https://developers.google.com/url-shortener/v1/getting_started#APIKey
api = "https://www.googleapis.com/urlshortener/v1/url"
target = "http://www.google.com/"
def goo_shorten_url(url=target):
headers = {"Content-Type": "application/json"}
payload = {'longUrl': url, "key":Key}
r = requests.post(api, headers=headers, data=json.dumps(payload))
print(r.text)
Access Token:
If above script doesn't work, please use an access token. The scope is https://www.googleapis.com/auth/urlshortener. In the case of use of access token, the sample script is as follows.
This answer in Stackoverflow shows how to get an Access Token: Link.
Sample script :
import json
import requests
headers = {
"Authorization": "Bearer " + "access token",
"Content-Type": "application/json"
}
payload = {"longUrl": "http://www.google.com/"}
r = requests.post(
"https://www.googleapis.com/urlshortener/v1/url",
headers=headers,
data=json.dumps(payload)
)
print(r.text)
Result :
{
"kind": "urlshortener#url",
"id": "https://goo.gl/#####",
"longUrl": "http://www.google.com/"
}
Using Service Account
To avoid the user need to accept the OAuth authentication (with a pop up screen and all that) there is a solution that uses authentication from machine to machine using a Service Account (As mentioned in another proposed answer).
To run this part of the code remember to install google api for python first with pip install google-api-python-client, then:
import json
from oauth2client.service_account import ServiceAccountCredentials
from apiclient.discovery import build
scopes = ['https://www.googleapis.com/auth/urlshortener']
path_to_json = "PATH_TO_JSON"
#Get the JSON file from Google Api [Website]
(https://console.developers.google.com/apis/credentials), then:
# 1. Click on Create Credentials.
# 2. Select "SERVICE ACCOUNT KEY".
# 3. Create or select a Service Account and
# 4. save the JSON file.
credentials = ServiceAccountCredentials.from_json_keyfile_name(path_to_json, scopes)
short = build("urlshortener", "v1",credentials=credentials)
request = short.url().insert(body={"longUrl":"www.google.com"})
print(request.execute())
Adapted from Google's Manual Page.
Even simpler:
In the case of use tinyurl.com
import requests
URL = "http://www.google.com/"
r = requests.get("http://tinyurl.com/" + "api-create.php?url=" + URL)
print(r.text)

Scrapy can't follow url with commas without encoding it

Can I force scrapy to request an URL including commas without encoding it into %2C? The site (phorum) I want to crawl does not accept encoded URLs and redirecting me into root.
So, for example, I have site to parse: example.phorum.com/read.php?12,8
The url is being encoded into: example.phorum.com/read.php?12%2C8=
But when try to request this url, every time, I'm redirected into page with list of topics:
example.phorum.com/list.php?12
In those example URLs 12 is category number, 8 is topic number.
I tried to disable redirecting by disabling RedirectMiddleware:
DOWNLOADER_MIDDLEWARES = {
'scrapy.contrib.downloadermiddleware.redirect.RedirectMiddleware': None,
}
and in spider:
handle_httpstatus_list = [302, 403]
Moreover I tried to rewrite this URL and request it by sub parser:
Rules = [Rule(RegexLinkExtractor(allow=[r'(.*%2C.*)']), follow=True, callback='prepare_url')]
def prepare_url(self, response):
url = response.url
url = re.sub(r'%2C', ',', url)
if "=" in url[-1]:
url = url[:-1]
yield Request(urllib.unquote(url), callback = self.parse_site)
Where parse_site is target parser, which still calls using encoded URL.
Thanks in advance for any feedback
You can try canonicalize=False. Example iPython session:
In [1]: import scrapy
In [2]: from scrapy.contrib.linkextractors.regex import RegexLinkExtractor
In [3]: hr = scrapy.http.HtmlResponse(url="http://example.phorum.com", body="""link""")
In [4]: lx = RegexLinkExtractor(canonicalize=False)
In [5]: lx.extract_links(hr)
Out[5]: [Link(url='http://example.phorum.com/list.php?1,2', text=u'link', fragment='', nofollow=False)]