Connecting to Google Drive using Python(PyDrive) - python-2.7

I have this code:
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
import time
gauth = GoogleAuth()
gauth.LocalWebserverAuth()
drive = GoogleDrive(gauth)
# Auto-iterate through all files that matches this query
file_list = drive.ListFile({'q': "'root' in parents"}).GetList()
for file1 in file_list:
print 'title: %s, id: %s' % (file1['title'], file1['id'])
time.sleep(1)
However, each time I want to run it, it opens my browser and asks for permission to access google drive. How to bypass it? I mean at least ask once, then save the "permission" and don't ask again or (which would be best) silently accept the permission in the background without my decision. I have downloaded client_secrets.json which is used for passing the authorization details.
What if I wanted to release my application? I mean I had to generate + download the client_secrets.json in order to make it work. I guess my users wouldn't wanto to do so. Is there any better, more convenient way?
I would also appreciate a tutorial-for-dummies about using Google Drive API because it's hard for me to understand it reading the documentation alone.

pydrive2 has done a fabulous job of automating the authentication file via settings.yaml file. Use the below package if you are experiencing the above problem
https://docs.iterative.ai/PyDrive2/

Related

Is there a way to pass credentials programmatically for using google documentAI without reading from a disk?

I am trying to run the demo code given in pdf parsing of GCP document AI. To run the code, exporting google credentials as a command line works fine. The problem comes when the code needs to run in memory and hence no credential files are allowed to be accessed from disk. Is there a way to pass the credentials in the document ai parsing function?
The sample code of google:
def main(project_id='YOUR_PROJECT_ID',
input_uri='gs://cloud-samples-data/documentai/invoice.pdf'):
"""Process a single document with the Document AI API, including
text extraction and entity extraction."""
client = documentai.DocumentUnderstandingServiceClient()
gcs_source = documentai.types.GcsSource(uri=input_uri)
# mime_type can be application/pdf, image/tiff,
# and image/gif, or application/json
input_config = documentai.types.InputConfig(
gcs_source=gcs_source, mime_type='application/pdf')
# Location can be 'us' or 'eu'
parent = 'projects/{}/locations/us'.format(project_id)
request = documentai.types.ProcessDocumentRequest(
parent=parent,
input_config=input_config)
document = client.process_document(request=request)
# All text extracted from the document
print('Document Text: {}'.format(document.text))
def _get_text(el):
"""Convert text offset indexes into text snippets.
"""
response = ''
# If a text segment spans several lines, it will
# be stored in different text segments.
for segment in el.text_anchor.text_segments:
start_index = segment.start_index
end_index = segment.end_index
response += document.text[start_index:end_index]
return response
for entity in document.entities:
print('Entity type: {}'.format(entity.type))
print('Text: {}'.format(_get_text(entity)))
print('Mention text: {}\n'.format(entity.mention_text))
When you run your workloads on GCP, you don't need to have a service account key file. You MUSTN'T!!
Why? 2 reasons:
It's useless because all GCP products have, at least, a default service account. And most of time, you can customize it. You can have a look on Cloud Function identity in your case.
Service account key file is a file. It means a lot: you can copy it, send it by email, commit it in Git repository... many persons can have access to it and you loose the management of this secret. And because it's a secret, you have to store it securely, you have to rotate it regularly (at least every 90 days, Google recommendation),... It's nighmare! When you can, don't use service account key file!
What the libraries are doing?
There are looking if GOOGLE_APPLICATION_CREDENTIALS env var exists.
There are looking into the "well know" location (when you perform a gcloud auth application-default login to allow the local application to use your credential to access to Google Resources, a file is created in a "standard location" on your computer)
If not, check if the metadata server exists (only on GCP). This server provides the authentication information to the libraries.
else raise an error.
So, simply use the correct service account in your function and provide it the correct role to achieve what you want to do.

Is it possible to make boto3 ignore signature expired error?

I was testing a Python app using boto3 to access DynamoDB and I got the following error message from boto3.
{'error':
{'Message': u'Signature expired: 20160915T000000Z is now earlier than 20170828T180022Z (20170828T181522Z - 15 min.)',
'Code': u'InvalidSignatureException'}}
I noticed that it's because I'm using the python package 'freezegun.free_time' to freeze the time at 20160915, since the mock data used by the tests is static.
I did research the error a little bit and I found this answer post. Basically, it's saying that AWS makes signatures invalid after a short time after they are created. From my understanding, in my case, the signature is marked to be created at 20160915 because of the use of 'freeze_time', but AWS uses the current time (the time when the test runs). Therefore, AWS thinks that this signature has expired for almost a year and sends an error message back.
Is there any way to make AWS ignore that error? Or is it possible to use boto3 to manually modify the date and time the signature is created at?
Please let me know if I'm not explaining my questions clearly. Any ideas are appreciated.
AWS API calls use a timestamp to prevent replay attacks. If you computer time/date is skewed too far from actual time, then the API calls will be denied.
Running requests from a computer with the date set to 2016 would certainly trigger this failure situation.
The checking is done on the host side, so there is nothing you can fix locally aside from using the real date (or somehow forcing Python into using a different date to the rest of your system).
Just came across a similar issue with immobilus. My solution was to replace datetime from botocore.auth with a unmocked version, as suggested by Antonio.
The pytest example would look like this
import types
from immobilus import logic
#pytest.fixture(scope='session', autouse=True)
def _boto_real_time():
from botocore import auth
auth.datetime = get_original_datetime()
def get_original_datetime():
original_datetime = types.ModuleType('datetime')
original_datetime.mktime = logic.original_mktime
original_datetime.date = logic.original_time
original_datetime.gmtime = logic.original_gmtime
original_datetime.localtime = logic.original_localtime
original_datetime.strftime = logic.original_strftime
original_datetime.date = logic.original_date
original_datetime.datetime = logic.original_datetime
return original_datetime
Is there any way to make AWS ignore that error?
No
Or is it possible to use boto3 to manually modify the date and time the signature is created at?
You should patch any datetime / time call that is in the auth.py file of the botocore library (source: https://github.com/boto/botocore/blob/develop/botocore/auth.py).

Display ".doc" ".docx" in browser

My users can upload their CV and this CV should be seen by any employer.
My problem is that my client want this CV to appear in the web browser without any download.
PDF work fine but doc & docx don't.
I've tried to use both gem ("docx" and "doc_ripper") but each one can just handle basic thing (table won't work ...)
The cv is attached to one user and stored on Amazon with Dragonfly
I've try the google view : http://googlesystem.blogspot.be/2009/09/embeddable-google-document-viewer.html
But as I do : user.cv_file.remote_url(expires: 5.minutes.from_now)
The url doesn't work anymore (this solution only work if the document is public)
I thought to make a second field which have the cv_file convert as a pdf if it's not.
Any possibilities to give a public permission to aws file for 2-3 min (time to render it with google view tool)
Thanks.
I assume you are talking about a file stored on S3. To make a file on S3 temporarily public you can generate a pre-signed URL with an expiration date/time: http://docs.aws.amazon.com/AmazonS3/latest/dev/ShareObjectPreSignedURL.html
I've used the gem htmltoword a few times now and it's done a good job from that end of the translations.
I did a quick search and there are a few promising gems that might help you out here - converting the resumes from Word (.doc, .docx) into an format that you can get to HTML for your views (perhaps storing this converted content in a DB table/column?).
Word docx_converter
Google Groups discussion of the issue
ydocx
docx
Thanks for answering but after many research, I finally found :
https://view.officeapps.live.com/op/view.aspx?src=
which work as well as the pdf reader from browser.
Be sure to have a public url for the file you want to display

Looking for a simple and minimalistic way to store small data packets in the cloud

I'm looking for a very simple and free cloud store for small packets of data.
Basically, I want to write a Greasemonkey script that a user can run on multiple machines with a shared data set. The data is primarily just a single number, eight byte per user should be enough.
It all boils down to the following requirements:
simple to develop for (it's a fun project for a few hours, I don't want to invest twice as much in the sync)
store eight bytes per user (or maybe a bit more, but it's really tiny)
ideally, users don't have to sign up (they just get a random key they can enter on all their machines)
I don't need to sign up (it's all Greasemonkey, so there's no way to hide a secret, like a developer key)
there is no private data in the values, so another user getting access to that information by guessing the random key is no big deal
the information is easily recreated (sharing it in the cloud is just for convenience), so another user taking over the 'key' is easily fixed as well
First ideas:
Store on Google Docs with a form as the frontend. Of course, that's kinda ugly and every user needs to set it up again.
I could set up a Google App Engine instance that allows storing a number to a key and retrieving the number by key. It wouldn't be hard, but it still sounds overkill for what I need.
I could create a Firefox add-on instead of a Greasemonkey script and use Mozilla Weave/Sync—which unfortunately doesn't support storing HTML5 local storage yet, so GM isn't enough. Of course I'd have to implement the same for Opera and Chrome then (assuming there are similar services for them), instead of just reusing the user script.
Anybody got a clever idea or a service I'm not aware of?
Update for those who are curious: I ended up going the GAE route (about half a page of Python code). I only discovered OpenKeyval afterwards (see below). The advantage is that it's pretty easy for users to connect on all their machines (just a Google account login, no other key to transfer from machine A to machine B), the disadvantage is that everybody needs a Google account.
OpenKeyval is pretty much what I was looking for.
OpenKeyval was what I was looking for but has apparently been shut down.
I think GAE will be nice choice. With your requirements for storage size you will never pass free 500 mb of GAE's store. And it will be easy to port your script across browsers because of REST nature of your service;)
I was asked to share my GAE key/value store solution, so here it comes. Note that this code hasn't run for years, so it might be wrong and/or very outdated GAE code:
app.yaml
application: myapp
version: 1
runtime: python
api_version: 1
handlers:
- url: /
script: keyvaluestore.py
keyvaluestore.py
from google.appengine.api import users
from google.appengine.ext import db
from google.appengine.ext import webapp
from google.appengine.ext.webapp.util import run_wsgi_app
class KeyValue(db.Model):
v = db.StringProperty(required=True)
class KeyValueStore(webapp.RequestHandler):
def _do_auth(self):
user = users.get_current_user()
if user:
return user
else:
self.response.headers['Content-Type'] = 'text/plain'
self.response.out.write('login_needed|'+users.create_login_url(self.request.get('uri')))
def get(self):
user = self._do_auth()
callback = self.request.get('jsonp_callback')
if user:
self.response.headers['Content-Type'] = 'text/plain'
self.response.out.write(self._read_value(user.user_id()))
def post(self):
user = self._do_auth()
if user:
self._store_value(user.user_id(), self.request.body)
def _read_value(self, key):
result = db.get(db.Key.from_path("KeyValue", key))
return result.v if result else 'none'
def _store_value(self, k, v):
kv = KeyValue(key_name = k, v = v)
kv.put()
application = webapp.WSGIApplication([('/', KeyValueStore)],
debug=True)
def main():
run_wsgi_app(application)
if __name__ == "__main__":
main()
The closest thing I've seen is Amazon's Simple Queue Service.
http://aws.amazon.com/sqs/
I've not used it myself so I'm not sure how the developer key aspect works, but they give you 100,000 free queries a month.

Best way to write an image to a Django HttpResponse()

I need to serve images securely to validated users only (i.e. they can't be served as static files). I currently have the following Python view in my Django project, but it seems inefficient. Any ideas for a better way?
def secureImage(request,imagePath):
response = HttpResponse(mimetype="image/png")
img = Image.open(imagePath)
img.save(response,'png')
return response
(Image is imported from PIL.)
Well, re-encoding is needed sometimes (i.e. applying an watermark over an image while keeping the original untouched), but for the most simple of cases you can use:
try:
with open(valid_image, "rb") as f:
return HttpResponse(f.read(), content_type="image/jpeg")
except IOError:
red = Image.new('RGBA', (1, 1), (255,0,0,0))
response = HttpResponse(content_type="image/jpeg")
red.save(response, "JPEG")
return response
Make use of FileResponse
A cleaner way, here we dont have to worry about the Content-Length and Content-Type headers, they are automatically added by guessing the contents of open().
from django.http import FileResponse
def send_file(response):
img = open('media/hello.jpg', 'rb')
response = FileResponse(img)
return response
Just stumbled on the somewhat bad advice (for production) and thought I would mention X-Sendfile which works with both Apache and Nginx and probably other webservers too.
https://pythonhosted.org/xsendfile/
Modern Web servers like Nginx are generally able to serve files faster, more efficiently and more reliably than any Web application they host. These servers are also able to send to the client a file on disk as specified by the Web applications they host. This feature is commonly known as X-Sendfile.
This simple library makes it easy for any WSGI application to use X-Sendfile, so that they can control whether a file can be served or what else to do when a file is served, without writing server-specific extensions. Use cases include:
Restrict document downloads to authenticated users.
Log who’s downloaded a file. Force a file to be downloaded instead of
rendered by the browser, or serve it with a name different from the
one on disk, by setting the Content-Disposition header.
The basic idea is you open the file and pass that handle back to the webserver which then returns the bytes to the client, freeing your python code to handle the next request. This is far more performant than the solution above since a slow client on the other end could hang your python thread for as long as it takes to download the file.
Here is a repo that shows how to do this for various webservers and although it is pretty old, it will at least give you an idea of what you need to do. https://github.com/johnsensible/django-sendfile