Encoding is not kept - python-2.7

I am using Python 2.7 and using google plus public API to get activity data in a file. I am encountering issues to maintain the json encoding in my file. Double quotes are coming as u'' in my file. Below is my code:
from apiclient import discovery
API_KEY = 'MY API KEY'
service = discovery.build("plus", "v1", developerKey=API_KEY)
activities_resource = service.activities()
request = activities_resource.search(query='India versus South Africa', maxResults=1, orderBy='best',)
while request!= None:
activities_document = request.execute()
if 'items' in activities_document:
with open("output.json", mode='a') as file:
data = str(activities_document['items'])
file.write(data +"\n\n")
request = service.activities().list_next(request, activities_document)
Output:
[{u'kind': u'plus#activity', u'provider': {u'title': u'Google+'}, u'titl.......
I am expecting [{"kind": "plus#activity", .....
I am running my code on windows and I have tried both on DOS and pycharm IDE. I have also run the code on ubuntu machine but same output. Please let me know what I am doing wrong.

The json module is used for generating JSON. Use it.

Related

Trouble authenticating and writing to database locally

I'm having trouble authenticating and writing data to a spanner database locally. All imports are up to date - google.cloud, google.auth2, etc. I have tried having someone else run this and it works fine, so the problem seems to be something on my end - something wrong or misconfigured on my computer, maybe where the credentials are stored or something?
Anyone have any ideas?
from google.cloud import spanner
from google.api_core.exceptions import GoogleAPICallError
from google.api_core.datetime_helpers import DatetimeWithNanoseconds
import datetime
from google.oauth2 import service_account
def write_to(database):
record = [[
1041613562310836275,
'test_name'
]]
columns = ("id", "name")
insert_errors = []
try:
with database.batch() as batch:
batch.insert_or_update(
table = "guild",
columns = columns,
values = record,
)
except GoogleAPICallError as e:
print(f'error: {e}')
insert_errors.append(e.message)
pass
return insert_errors
if __name__ == "__main__":
credentials = service_account.Credentials.from_service_account_file(r'path\to\a.json')
instance_id = 'instance-name'
database_id = 'database-name'
spanner_client = spanner.Client(project='project-name', credentials=credentials)
print(f'spanner creds: {spanner_client.credentials}')
instance = spanner_client.instance(instance_id)
database = instance.database(database_id)
insert_errors = write_to(database)
some credential tests:
creds = service_account.Credentials.from_service_account_file(a_json)
<google.oauth2.service_account.Credentials at 0x...>
spanner_client.credentials
<google.auth.credentials.AnonymousCredentials at 0x...>
spanner_client.credentials.signer_email
AttributeError: 'AnonymousCredentials' object has no attribute 'signer_email'
creds.signer_email
'...#....iam.gserviceaccount.com'
spanner.Client().from_service_account_json(a_json).credentials
<google.auth.credentials.AnonymousCredentials object at 0x...>
The most common reason for this is that you have accidentally set (or forgot to unset) the environment variable SPANNER_EMULATOR_HOST. If this environment variable has been set, the client library will try to connect to the emulator instead of Cloud Spanner. This will cause the client library to wait for a long time while trying to connect to the emulator (assuming that the emulator is not running on your machine). Unset the environment variable to fix this problem.
Note: This environment variable will only affect Cloud Spanner client libraries, which is why other Google Cloud product will work on the same machine. The script will also in most cases work on other machines, as they are unlikely to have this environment variable set.

Dialogflow: Agent metadata not found for agentId

I'm trying to use Dialogflow's detect_intent in Python and I keep getting:
404 com.google.apps.framework.request.NotFoundException: Agent metadata not found for agentId: ####-####-####-####-####
Here's a snippet of my code:
import google.cloud.dialogflow as dialogflow
from CONFIG import DIALOGFLOW_PROJECT_ID
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = 'credentials/dialogflow.json'
def predict_intent(text, language):
session_client = dialogflow.SessionsClient()
session = session_client.session_path(DIALOGFLOW_PROJECT_ID, SESSION_ID)
text_input = dialogflow.TextInput(text=text, language_code=language)
query_input = dialogflow.QueryInput(text=text_input)
response = session_client.detect_intent(session=session, query_input=query_input) # ERROR
return response.query_result.intent.display_name
I tried running the function multiple times and some of them succeed, but most fall in the exception.
I can train the bot using the same interface and it works fine.
I'm using Python 3.7 and the following Google Cloud modules: google-api-core==2.0.1, google-auth==2.0.2, google-cloud-dialogflow==2.7.1, googleapis-common-protos==1.53.0.

How can I get reports from Google Cloud Storage using the Google's API

I have to create a program that get informations on a daily basis about installations of a group of apps on the AppStore and the PlayStore.
For the PlayStore, using Google Cloud Storage I followed the instructions on this page using the client library and a Service Account method and the Python code example :
https://support.google.com/googleplay/android-developer/answer/6135870?hl=en&ref_topic=7071935
I slightly changed the given code to make it work since documentation looks not up-to-date. I made it possible to connect to the API and it seems to connect correctly.
My problem is that I don't understand what object I get and how to use it. It's not a report it just looks like files properties in a dict.
This is my code (private data "hidden") :
import json
from httplib2 import Http
from oauth2client.service_account import ServiceAccountCredentials
from googleapiclient.discovery import build
client_email = '************.iam.gserviceaccount.com'
json_file = 'PATH/TO/MY/JSON/FILE'
cloud_storage_bucket = 'pubsite_prod_rev_**********'
report_to_download = 'stats/installs/installs_****************_202005_app_version.csv'
private_key = json.loads(open(json_file).read())['private_key']
credentials = ServiceAccountCredentials.from_json_keyfile_name(json_file, scopes='https://www.googleapis.com/auth/devstorage.read_only')
storage = build('storage', 'v1', http=credentials.authorize(Http()))
supposed_to_be_report = storage.objects().get(bucket=cloud_storage_bucket, object=report_to_download).execute()
When I print the supposed_to_be_report - which is a dictionary- I only get what I understand as Metadata about he report like this:
{'kind': 'storage#object', 'id': 'pubsite_prod_rev_***********/stats/installs/installs_****************_202005_app_version.csv/1591077412052716',
'selfLink': 'https://www.googleapis.com/storage/v1/b/pubsite_prod_rev_***********/o/stats%2Finstalls%2Finstalls_*************_202005_app_version.csv',
'mediaLink': 'https://storage.googleapis.com/download/storage/v1/b/pubsite_prod_rev_***********/o/stats%2Finstalls%2Finstalls_****************_202005_app_version.csv?generation=1591077412052716&alt=media',
'name': 'stats/installs/installs_***********_202005_app_version.csv',
'bucket': 'pubsite_prod_rev_***********',
'generation': '1591077412052716',
'metageneration': '1',
'contentType': 'text/csv;
charset=utf-16le', 'storageClass': 'STANDARD', 'size': '378', 'md5Hash': '*****==', 'contentEncoding': 'gzip'......
I am not sure I'm using it correctly. Could you please explain me where am I wrong and/or how to get installs reports correctly ?
Thanks.
I can see that you are using googleapiclient.discovery client, this is not an issue, but the recommended way to access Google Cloud APIs programmatically is by using the client libraries.
Second, you are just retrieving the object's metadata. You can download the object to have access to the file contents, this is a sample using the client library.
from google.cloud import storage
def download_blob(bucket_name, source_blob_name, destination_file_name):
"""Downloads a blob from the bucket."""
# bucket_name = "your-bucket-name"
# source_blob_name = "storage-object-name"
# destination_file_name = "local/path/to/file"
storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
blob = bucket.blob(source_blob_name)
blob.download_to_filename(destination_file_name)
print(
"Blob {} downloaded to {}.".format(
source_blob_name, destination_file_name
)
)
Sample taken from official docs.

Scrapy how to save a State between spider runs (via scrapinghub)?

I have a spider that will run on schedule. Spider input is based on Date. From date of last scrape to todays date. So the question is how to save the date of last scrape within the Scrapy project? There is an option to get data from scrapy settings using pkjutil module, but i did not find any reference in the docs on how to write data in that file. Any idea? Maybe an alternative?
P.S. My other option is to use some free remote MySql DB just for this. But looks like more work if simple solution is available.
import pkgutil
class CodeSpider(scrapy.Spider):
name = "code"
allowed_domains = ["google.com.au"]
def start_requests(self):
f = pkgutil.get_data("au_go", "res/state.json")
ids = json.loads(f)
id = ids[0]['state']
yield {'state':id}
ids[0]['state'] = 'New State'
with open('./au_go/res/state.json', 'w') as f:
json.dump(ids, f)
The above solution works fine when ran locally. But I am getting no such file or directory when running the code at Scrapinghub.
File "/tmp/unpacked-eggs/__main__.egg/au_go/spiders/test_state.py", line 33, in parse
with open(savePath, 'w') as f:
IOError: [Errno 2] No such file or directory: './au_go/res/state.json'
The problem is fixed with use of Scrapinghub Colections
And scrapinghub API. Works nice now.
Here is an example code in case somebody will find it usefull.
from scrapinghub import ScrapinghubClient
client = ScrapinghubClient(Your API KEY)
project = client.get_project(Your Project ID)
collections = project.collections
last_accessed = collections.get_store('last_accessed')
last_accessed.set({'_key': 'Date', 'value': '12-54-1235'})
print last_accessed.get('Date')['value']

POST Request to Heroku in Python - 403 Forbidden

I'm learning web scraping and building a simple web app at the moment, and I decided to practice scraping a schedule of classes. Here's a code snippet I'm having trouble with in my application, using Python 2.7.4, Flask, Heroku, BeautifulSoup4, and Requests.
import requests
from bs4 import BeautifulSoup as Soup
url = "https://telebears.berkeley.edu/enrollment-osoc/osc"
code = "26187"
values = dict(_InField1 = "RESTRIC", _InField2 = code, _InField3 = "13D2")
html = requests.post(url, params=values)
soup = Soup(html.content, from_encoding="utf-8")
sp = soup.find_all("div", {"class" : "layout-div"})[2]
print sp.text
This works great locally. It gives me back the string "Computer Science 61A P 001 LEC:" as expected. However, when I tried to run it on Heroku (using heroku run bash and then run python), I got back an error,403 Forbidden.
Am I missing some settings on Heroku? At first I thought it's the school settings, but then I was wondering why it works locally without any trouble... Any explanation/suggestion would be really appreciated! Thank you in advance.
I was having a similar issue, request was working locally but getting blocked on Heroku. It looks like the issue is that some websites block requests coming from Heroku (which on on AWS Servers). To get around this you can send your requests via a proxy server.
There are a bunch of different add-ons in heroku to achieve this, I went with fixie which has a reasonably sized free tier.
To install:
heroku addons:create fixie:tricycle
Then import into your local environment so you can try locally:
heroku config -s | grep FIXIE_URL >> .env
then in your python file you just add a couple of lines:
import os
import requests
from bs4 import BeautifulSoup as Soup
proxyDict = {
"http" : os.environ.get('FIXIE_URL', ''),
"https" : os.environ.get('FIXIE_URL', '')
}
url = "https://telebears.berkeley.edu/enrollment-osoc/osc"
code = "26187"
values = dict(_InField1 = "RESTRIC", _InField2 = code, _InField3 = "13D2")
html = requests.post(url, params=values, proxies=proxyDict)
soup = Soup(html.content, from_encoding="utf-8")
sp = soup.find_all("div", {"class" : "layout-div"})[2]
print sp.text
Docs for Fixie are here:
https://devcenter.heroku.com/articles/fixie