Google Cloud Webapp to use a user uploaded file for processing - flask

I recently moved my project from heroku to google cloud. It's written in flask and basically does some text summary (nothing fancy) of an uploaded .docx file. I was able to locally use files on heroku due to their ephemeral file system.
With google cloud, finding myself lost trying to use a file uploaded and running python functions on it.
The error I'm getting is:
with open(self.file, 'rb') as file: FileNotFoundError: [Errno 2] No such file or directory: 'http://storage.googleapis.com/...'
Edited the specifics out for now but when I open the link in a browser it brings up the download window. I know the file gets there since I go to google cloud and everything is in the proper bucket.
Also is there a way to delete from the bucket immediately after python goes through the document? Currently have the lifecycle set to a day but just need the data temporarily runover.
I'm sorry if these are silly questions. Very new to this and trying to learn.
Thanks
Oh and here's the current code
gcs = storage.Client()
user_file = request.files['file']
local = secure_filename(user_file.filename)
blob = bucket.blob(local)
blob.upload_from_string(user_file.read(),content_type=user_file.content_type)
this_file = f"http://storage.googleapis.com/{CLOUD_STORAGE_BUCKET}/{local}"
then a function is supposed to open this_file

returned a public_url to a file name to be processed and used
def open_file(self):
url = self.file
file = BytesIO(requests.get(url).content)
return docx.Document(file)

Related

Django open excel.xlsx with openpyxl from Google Cloud Storage

I need to open a .xlsx file from my bucket on Google Cloud Storage, the problem is I get :FileNotFoundError at /api/ficha-excel
[Errno 2] No such file or directory: 'ficha.xlsx'
These are the settings from my bucket.
UPLOAD_ROOT = 'reportes/'
MEDIA_ROOT = 'reportes'
These are the route bucket/reportes/ficha.xlsx
This is the code of my get function:
directorio = FileSystemStorage("/reportes").base_location
os.makedirs(directorio, exist_ok=True)
# read
print("Directorios: ", directorio)
plantilla_excel = openpyxl.load_workbook(f"{directorio}/ficha.xlsx")
print(plantilla_excel.sheetnames)
currentSheet = plantilla_excel['Hoja1']
print(currentSheet['A5'].value)
What is the problem with the path? I can't figure out.
The below solution doesn’t use Django FileStorage/Storage classes. It opens a .xlsx file from the Cloud Storage bucket on Google Storage using openpyxl.
Summary :
I uploaded the Excel file on GCS, read the Blob data with openpyxl via BytesIO and saved the data in the workbook using the .save() method.
Steps to Follow :
Create a Google Cloud Storage bucket. Choose a globally unique name for it. Keep with the defaults and finally enter Create.
Choose an Excel file from your local system and upload it in the bucket using the “Upload files” option.
Once you have the excel file in your bucket, follow the steps below :
Go to Google Cloud Platform and create a service account (API). Click
Navigation Menu> APIs & Services> Credentials to go to the screen.
Then click Manage Service Accounts.
On the next screen, click Create Service Account.
Enter the details of the service account for each item.
In the next section, you will create a role for Cloud Storage. Choose
Storage Admin (full permission).
Click the service account you created, click Add Key in the Keys
field, and select Create New Key.
Select JSON as the key type and "create" it. Since the JSON file is
downloaded in the local storage, use the JSON file in the next item
and operate Cloud Storage from Python.
We will install the libraries required for this project in Cloud
Shell First, install the Google Cloud Storage library with pip
install to access Cloud Storage:
pip install google-cloud-storage
Install openpyxl using :
pip install openpyxl
Create a folder (excel) with the name of your choice in your Cloud editor.
Create files within it :
main.py
JSON key file (the one that got downloaded in local storage, copy that
file into this folder)
excel
main.py
●●●●●●●●●●.json
Write the below lines of code in main.py file :
from google.cloud import storage
import openpyxl
import io
#Create a client instance for google cloud storage
client = storage.Client.from_service_account_json('●●●●●●●●●●.json') //The path to your JSON key file which is now
#Get an instance of a bucket
bucket = client.bucket(‘bucket_name’) //only the bucketname will do, full path not necessary.
##Get a blob instance of a file
blob = bucket.blob(‘test.xlsx') // test.xlsx is the excel file I uploaded in the bucket already.
buffer = io.BytesIO()
blob.download_to_file(buffer)
wb = openpyxl.load_workbook(buffer)
wb.save('./retest.xlsx')
You will see a file ‘retest.xlsx’ getting created at the same folder in Cloud Editor.

Rename file name in GCP Cloud Storage and remove square brackets []

We have many videos uploaded to GCP Cloud Storage.
We need to change file name and remove [].
Asking if there is a good solution.
file example:
gs://xxxxxx/xxxxxx/[BlueLobster] Saint Seiya The Lost Canvas - 06 [1080p].mkv
You can't rename file in Cloud Storage. Renaming file equals to copying the file with a new name and to delete the older name.
It will take time if you have a lot of (large) files, but it's not impossible.
Based on the given scenario, you want to bulk rename all the filenames with ¨[]¨. Based on this documentation, gsutil interprets these characters as wildcards. gsutil does not support this currently.
There´s a way to handle this kind of request by using a custom script to rename all the files with ´[´.
You may use any programming languages that have Cloud Storage client libraries. For this instructions, we´ll be using Python for the custom script.
On your Google Cloud Console, Click the Activate Cloud Shell on the top right of the Google Cloud Console beside the question mark sign. For more information, you may refer here.
On your Cloud Shell, Install the Python client library by using this command:
pip install --upgrade google-cloud-storage
For more information, please refer on this documentation.
After the installation of client library, launch the Cloud Shell Editor by clicking the Open Editor on the top right side of the Cloud Shell. You may refer here for more information.
On your Cloud Shell Editor, click the File menu and choose New File. Name it script.py. Click Ok.
This code assumes that all the objects on your bucket have the same name from the sample you provided.:
import re
from google.cloud import storage
storage_client = storage.Client()
bucket_name = "my_bucket"
bucket = storage_client.bucket(bucket_name)
storage_client = storage.Client()
blobs = storage_client.list_blobs(bucket_name)
pattern = r"[\([{})\]]"
for blob in blobs:
out_var = blob.name
fixed_var = re.sub(pattern, '', blob.name)
print(out_var + " " + fixed_var)
new_blob = bucket.rename_blob(blob, fixed_var)
Change the content of ¨my_bucket¨ to the name of your bucket.
Click File and then Save or you can just press Ctrl + S.
Go back to the terminal by clicking the Open Terminal on the top right section of the Cloud Shell Editor.
Copy and paste this code to the editor:
python script.py
To run the script, press the Enter key.
Files that have brackets are now renamed.
The files aren´t renamed in the backend. Under the hood, it's more of being rewritten with a new name and it's due to object immutability. This will only copy the old files with a new name and removes the old file afterwards.

openshift django opening and writing to a text file

I have created a questionnaire with django and in my views.py have the following code as part of a function
if text is not None:
for answer in datas:
f=open('/Users/arsenios/Desktop/data.txt', 'a')
f.write(answer+",")
f.write("\n")
f.close()
This works fine locally. It creates a text folder on the desktop and fills it in with the data of each person that completes it. When I run the code with openshift I get:
"[Errno 2] No such file or directory: '/Users/arsenios/Desktop/data.txt'".
I have seen some people asking and mentioning "OPENSHIFT_DATA_DIR" but I feel like there are steps they haven't included. I don't know what changes I should make to settings.py and views.py.
Any help would be appreciated.
The OPENSHIFT_DATA_DIR is from OpenShift 2 and is not set in OpenShift 3.
The bigger question is whether that is a temporary file or needs to be persistent across restarts of the application container. If temporary file, use a name under /tmp directory. If it needs to be persistent, then you need to look at mounting a persistent volume to save the data in, or look at using a separate database with its own persistent storage.
For explanation of some of the fundamentals of using OpenShift 3, suggest you look at the free eBook at:
https://www.openshift.com/deploying-to-openshift/
I managed to solve it. It turns out the data was getting saved in data.txt in openshift and the command I had to use was oc rsync pod:/opt/app-root/src/data.txt /path/to/directory. This command downloaded the data.txt file from openshift to the directory I wanted to. So in my case I had to use oc rsync save-4-tb2dm:/opt/app-root/src/data.txt /Users/arsenios/Desktop

How to upload files to specific folder on cloud storage via api

My application running on computer engine creates images which I upload to cloud storage. This works well using
BlobInfo blobInfo =
storage.create(
BlobInfo
.newBuilder(bucketName, fileName)
.setAcl(new ArrayList<>(Arrays.asList(Acl.of(User.ofAllUsers(), Role.READER))))
.build(),
filePart.getInputStream());
But I need to upload this to specific folder like 'bucketname/170717/'
Couldn't find way to upload to specific folder. Any pointers would be appreciated.
So finally I got the solution.
storage.create(
BlobInfo
.newBuilder(bucketName, fileName)
In above snippet of code if fileName has a folder path, cloud storage creates it. So when I passed "images/" + fileName a new folder images was created and file was created inside that.

Copy an image in Amazon S3 from Image URL using Django

I have an image URL (for example: http://www.myexample.come/testImage.jpg) and I would to upload this image on Amazon S3 using Django.
I'm not found a way to copy directly the resource from URL in Amazon S3 passing directly the file URL.
So, I think that i have to implement these steps in my project:
Download the file locally from URL http://www.myexample.come/testImage.jpg. I will have a local file testImage.jpg
I have to upload the local file into Amazon S3. I will have a S3 Url.
I have to delete the local file testImage.jpg
Is this a good way to build this feature?
Is possible to improve these steps?
I have to use this features when I receive a REST request and I have to respond passing in the response the uploaded S3 File Url... Are these steps a good way about performance?
The easiest way off the top of my head would be to use requests with io from the python std lib -- this is a bit of code I used a while back, I just tested it with python 2.7.9 and it works
>>> requests_image('http://docs.python-requests.org/en/latest/_static/requests-sidebar.png')
and it works with the latest version of requests (2.6.0) - but I should point out that it's just a snippet, and I was in full control of the image urls being handed to the function, so there's nothing in the way of error checking (you could use Pillow to open the image and confirm it's really a jpeg, etc.)
import requests
from io import open as iopen
from urlparse import urlsplit
def requests_image(file_url):
suffix_list = ['jpg', 'gif', 'png', 'tif', 'svg',]
file_name = urlsplit(file_url)[2].split('/')[-1]
file_suffix = file_name.split('.')[1]
i = requests.get(file_url)
if file_suffix in suffix_list and i.status_code == requests.codes.ok:
with iopen(file_name, 'wb') as file:
file.write(i.content)
else:
return False