Accessing downloaded data in a Cloud Run instance - google-cloud-platform

I have a Google Cloud Run instance that looks like this:
import json
import os
import rarfile
from google.cloud import storage
from flask import Flask, request
app = Flask(__name__)
#app.route("/", methods=["POST"])
def index():
file_id = request.values.get("fileId")
try:
storage_client = storage.Client()
bucket = storage_client.get_bucket("test-bucket")
blob = bucket.blob(file_id)
blob.download_to_filename(file_id)
rf = rarfile.RarFile(file_id)
rf.extractall()
return ("", 204)
except Exception as e:
return f"Error: {e}", 400
return ("500 Error", 500)
However, when I trigger the instance, I get the following error:
Error: can only concatenate str (not "RarCannotExec") to str
What is going wrong here? When I download the file and unzip in locally, I run into no problems. Does it have to do with the file system of Cloud Run instances?
EDIT:
I think from above it is clear that the error is from the except where I return Error: {e}. However, from analyzing the logs, it is apparent that the program fails at the rf = rarfile.RarFile(file_id) line. That, I am still unclear on.
EDIT 2:
I need to install either unrar or unrar-free to the container. Cheers!

Related

Scheduled Tasks - Runs without Error but does not produce any output - Django PythonAnywhere

I have setup a scheduled task to run daily on PythonAnywhere.
The task uses the Django Commands as I found this was the preferred method to use with PythonAnywhere.
The tasks produces no errors but I don't get any output. 2022-06-16 22:56:13 -- Completed task, took 9.13 seconds, return code was 0.
I have tried uses Print() to debug areas of the code but I cannot produce any output in either the error or server logs. Even after trying print(date_today, file=sys.stderr).
I have set the path on the Scheduled Task as: (Not sure if this is correct but seems to be the only way I can get it to run without errors.)
workon advancementvenv && python3.8 /home/vicinstofsport/advancement_series/manage.py shell < /home/vicinstofsport/advancement_series/advancement/management/commands/schedule_task.py
I have tried setting the path as below but then it gets an error when I try to import from the models.py file (I know this is related to a relative import but cannot management to resolve it). Traceback (most recent call last): File "/home/vicinstofsport/advancement_series/advancement/management/commands/schedule_task.py", line 3, in <module> from advancement.models import Bookings ModuleNotFoundError: No module named 'advancement'
2022-06-17 03:41:22 -- Completed task, took 14.76 seconds, return code was 1.
Any ideas on how I can get this working? It all works fine locally if I use the command py manage.py scheduled_task just fails on PythonAnywhere.
Below is the task code and structure of the app.
from django.core.management.base import BaseCommand
import requests
from advancement.models import Bookings
from datetime import datetime, timedelta, date
import datetime
from sendgrid import SendGridAPIClient
from sendgrid.helpers.mail import Mail
from django.core.mail import send_mail
import os
from decouple import config
class Command(BaseCommand):
help = 'Sends Program Survey'
def handle(self, *args, **kwargs):
# Get today's date
date_today = datetime.datetime.now().date()
# Get booking data
bookings = Bookings.objects.all()
# For each booking today, send survey email
for booking in bookings:
if booking.booking_date == date_today:
if booking.program_type == "Sport Science":
booking_template_id = 'd-bbc79704a31a4a62a5bfea90f6342b7a'
email = booking.email
booking_message = Mail(from_email=config('FROM_EMAIL'),
to_emails=[email],
)
booking_message.template_id = booking_template_id
try:
sg = SendGridAPIClient(config('SG_API'))
response = sg.send(booking_message)
except Exception as e:
print(e)
else:
booking_template_id = 'd-3167927b3e2146519ff6d9035ab59256'
email = booking.email
booking_message = Mail(from_email=config('FROM_EMAIL'),
to_emails=[email],
)
booking_message.template_id = booking_template_id
try:
sg = SendGridAPIClient(config('SG_API'))
response = sg.send(booking_message)
except Exception as e:
print(e)
else:
print('No')
Thanks in advance for any help.
Thanks Filip and Glenn, testing within the bash console and changing the directory in the task helped to fix the issue. Adding 'cd /home/vicinstofsport/advancement_series && to my task allowed the function to run.'

Unable to Access Flask App URL within Google Colab getting site not available

I have created a Flask prediction app (within Google Colab) and when I am trying to run it post adding all the dependencies within the colab environment I am getting the url but when I click on it it show site cannot be reached.
I have the Procfile, the pickled model and the requirements text file but for some reason its not working. Also, I tried deploying this app using Heroku and it met the same fate where I got the app error.
For more context please visit my github repo.
Any help or guidance will be highly appreciated.
from flask import Flask, url_for, redirect, render_template, jsonify
from pycaret.classification import*
import pandas as pd
import numpy as np
import pickle
app = Flask(__name__)
model = load_model('Final RF Model 23JUL2021')
cols = ['AHT','NTT','Sentiment','Complaints','Repeats']
#app.route('/')
def home():
return render_template("home.html")
#app.route('/predict',methods=['POST'])
def predict():
int_features = [x for x in request.form.values()]
final = np.array(int_features)
data_unseen = pd.DataFrame([finak], columns = cols)
prediction = predict_model(model, data=data_unseen, round=0)
prediction = int(prediction.Label[0])
return render_template('home.html',pred='Predicted Maturiy Level is{}'.format(prediction))
#app.route('/predict_api',methods=['POST'])
def predict_api():
data = request.get_json(force=True)
data_unseen = pd.DataFrame([data])
prediction = predict_model(model, data=data_unseen)
output = prediction.Label[0]
return jsonify(output)
if __name__ == '__main__':
app.run(debug=True)
You cannot run flask app same as in your machine. You need to use flask-ngrok.
!pip install flask-ngrok
from flask_ngrok import run_with_ngrok
[...]
app = Flask(__name__)
run_with_ngrok(app)
[...]
app.run()
You can't use debug=True parameter in ngrok.

How to serve image from gcs using python 2.7 standard app engine?

The following code is almost verbatim copy of the sample code from Google to serve a file from Google Cloud Storage via Python 2.7 App Engine Standard Environment. When serving locally with command:
dev_appserver.py --default_gcs_bucket_name darianhickman-201423.appspot.com
import cloudstorage as gcs
import webapp2
class LogoPage(webapp2.RequestHandler):
def get(self):
bucket_name = "darianhickman-201423.appspot.com"
self.response.headers['Content-Type'] = 'image/jpeg'
self.response.headers['Message'] = "LogoPage"
gcs_file = gcs.open("/"+ bucket_name +'/logo.jpg')
contents = gcs_file.read()
gcs_file.close()
self.response.body.(contents)
app = webapp2.WSGIApplication([ ('/logo.jpg', LogoPage),
('/logo2.jpg', LogoPage)],
debug=True)
The empty body message I see on the console is:
NotFoundError: Expect status [200] from Google Storage. But got status 404.
Path: '/darianhickman-201423.appspot.com/logo.jpg'.
Request headers: None.
Response headers: {'date': 'Sun, 30 Dec 2018 18:54:54 GMT', 'connection': 'close', 'server': 'Development/2.0'}.
Body: ''.
Extra info: None.
Again this is almost identical to read logic documented at
https://cloud.google.com/appengine/docs/standard/python/googlecloudstorageclient/read-write-to-cloud-storage
If you serve it locally using dev_appserver.py, it runs a local emulation of Cloud Storage and does not connect to the actual Google Cloud Storage.
Try writing a file and then reading it. You’ll see that it will succeed.
Here is a sample:
import os
import cloudstorage as gcs
from google.appengine.api import app_identity
import webapp2
class MainPage(webapp2.RequestHandler):
def get(self):
bucket_name = os.environ.get('BUCKET_NAME',app_identity.get_default_gcs_bucket_name())
self.response.headers['Content-Type'] = 'text/plain'
filename = "/" + bucket_name + "/testfile"
#Create file
gcs_file = gcs.open(filename,
'w',
content_type='text/plain')
gcs_file.write('Hello world\n')
gcs_file.close()
#Read file and display content
gcs_file = gcs.open(filename)
contents = gcs_file.read()
gcs_file.close()
self.response.write(contents)
app = webapp2.WSGIApplication(
[('/', MainPage)], debug=True)
Run it with dev_appserver.py --default_gcs_bucket_name a-local-bucket .
If you deploy your application on Google App Engine then it will work (assuming you have a file called logo.jpg uploaded) because it connects to Google Cloud Storage. I tested it with minor changes:
import os
import cloudstorage as gcs
from google.appengine.api import app_identity
import webapp2
class LogoPage(webapp2.RequestHandler):
def get(self):
bucket_name = os.environ.get('BUCKET_NAME',app_identity.get_default_gcs_bucket_name())
#or you can use bucket_name = "<your-bucket-name>"
self.response.headers['Content-Type'] = 'image/jpeg'
self.response.headers['Message'] = "LogoPage"
gcs_file = gcs.open("/"+ bucket_name +'/logo.jpg')
contents = gcs_file.read()
gcs_file.close()
self.response.write(contents)
app = webapp2.WSGIApplication(
[('/', LogoPage)], debug=True)
Also, It's worth mentioning that the documentation for Using the client library with the development app server seems to be outdated, it states that:
There is no local emulation of Cloud Storage, all requests to read and
write files must be sent over the Internet to an actual Cloud Storage
bucket.
The team responsible for the documentation has already been informed about this issue.

Python: Erratic joblib behaviour on Flask

I am trying to deploy a machine learning model on AWS EC2 instance using Flask. These are sklearn's fitted Random Forest models that are pickled using joblib. When I host Flask on localhost and load them into memory everything runs smoothly. However, when I deploy it on the apache2 server using mod_wsgi, joblib works sometimes(i.e. the models are loaded using joblib sometimes) and the other times the server just hangs. There is no error in logs. Any ideas would be appreciated.
Here is the relevant code that I am using:
# In[49]:
from flask import Flask, jsonify, request, render_template
from datetime import datetime
from sklearn.externals import joblib
import pickle as pkl
import os
# In[50]:
app = Flask(__name__, template_folder="/home/ubuntu/flaskapp/")
# In[51]:
log = lambda msg: app.logger.info(msg, extra={'worker_id': "request.uuid" })
# Logger
import logging
handler = logging.FileHandler('/home/ubuntu/app.log')
handler.setLevel(logging.ERROR)
app.logger.addHandler(handler)
# In[52]:
#app.route('/')
def host_template():
return render_template('Static_GUI.html')
# In[53]:
def load_models(path):
model_arr = [0]*len(os.listdir(path))
for filename in os.listdir(path):
f = open(path+"/"+filename, 'rb')
model_arr[int(filename[2:])] = joblib.load(f)
print("Classifier ", filename[2:], " added.")
f.close()
return model_arr
# In[54]:
partition_limit = 30
# In[55]:
print("Dictionaries being loaded.")
dict_file_path = "/home/ubuntu/Dictionaries/VARR"
dictionaries = pkl.load(open(dict_file_path, "rb"))
print("Dictionaries Loaded.")
# In[56]:
print("Begin loading classifiers.")
model_path = "/home/ubuntu/RF_Models/"
classifier_arr = load_models(model_path)
print("Classifiers Loaded.")
if __name__ == '__main__':
log("/home/ubuntu/print.log")
print("Starting API")
app.run(debug=True)
I was stuck with this for quite sometime. Posting the answer in case someone runs into this problem. Using print statements and looking at logs I narrowed the problem down to joblib.load statement. I found this awesome blog: http://blog.rtwilson.com/how-to-fix-flask-wsgi-webapp-hanging-when-importing-a-module-such-as-numpy-or-matplotlib
The idea of using a global process group fixed the problem. That forced the use of main interpreter just as the top comment on that blog page mentions.

"errorMessage": "module initialization error"

Using Python, I followed and when it came to Test it, the following error popped up:
{
"errorMessage": "module initialization error"
}
What could I have done wrong?
from __future__ import print_function
import os
from datetime import datetime
from urllib2 import urlopen
SITE = os.environ['site'] # URL of the site to check, stored in the site environment variable, e.g. https://aws.amazon.com
EXPECTED = os.environ['expected'] # String expected to be on the page, stored in the expected environment variable, e.g. Amazon
def validate(res):
'''Return False to trigger the canary
Currently this simply checks whether the EXPECTED string is present.
However, you could modify this to perform any number of arbitrary
checks on the contents of SITE.
'''
return EXPECTED in res
def lambda_handler(event, context):
print('Checking {} at {}...'.format(SITE, event['time']))
try:
if not validate(urlopen(SITE).read()):
raise Exception('Validation failed')
except:
print('Check failed!')
raise
else:
print('Check passed!')
return event['time']
finally:
print('Check complete at {}'.format(str(datetime.now())))
You don't need any environment variables. Just keep it simple
from __future__ import print_function
import os
from datetime import datetime
from urllib2 import urlopen
def lambda_handler(event, context):
url = 'https://www.google.com' # change it with your own
print('Checking {} at {}...'.format(url, datetime.utcnow()))
html = urlopen(url).read()
# do some processing
return html
Here is another simple example.
from __future__ import print_function
def lambda_handler(event, context):
first = event.get('first', 0)
second = event.get('second', 0)
sum = first + second
return sum
Here is a sample event which will be used to invoke this lambda. you can configure event from Lambda web interface. (or google it)
{
"first": 10,
"second": 23
}
In my case, I missed adding the logging_config.ini to the lambda function.
I guess you would face similar error when the lambda function doesn't find the referenced file or package.
Thanks to the new cloud9 IDE integration, I was able to create one on the fly.