Django open excel.xlsx with openpyxl from Google Cloud Storage - django

I need to open a .xlsx file from my bucket on Google Cloud Storage, the problem is I get :FileNotFoundError at /api/ficha-excel
[Errno 2] No such file or directory: 'ficha.xlsx'
These are the settings from my bucket.
UPLOAD_ROOT = 'reportes/'
MEDIA_ROOT = 'reportes'
These are the route bucket/reportes/ficha.xlsx
This is the code of my get function:
directorio = FileSystemStorage("/reportes").base_location
os.makedirs(directorio, exist_ok=True)
# read
print("Directorios: ", directorio)
plantilla_excel = openpyxl.load_workbook(f"{directorio}/ficha.xlsx")
print(plantilla_excel.sheetnames)
currentSheet = plantilla_excel['Hoja1']
print(currentSheet['A5'].value)
What is the problem with the path? I can't figure out.

The below solution doesn’t use Django FileStorage/Storage classes. It opens a .xlsx file from the Cloud Storage bucket on Google Storage using openpyxl.
Summary :
I uploaded the Excel file on GCS, read the Blob data with openpyxl via BytesIO and saved the data in the workbook using the .save() method.
Steps to Follow :
Create a Google Cloud Storage bucket. Choose a globally unique name for it. Keep with the defaults and finally enter Create.
Choose an Excel file from your local system and upload it in the bucket using the “Upload files” option.
Once you have the excel file in your bucket, follow the steps below :
Go to Google Cloud Platform and create a service account (API). Click
Navigation Menu> APIs & Services> Credentials to go to the screen.
Then click Manage Service Accounts.
On the next screen, click Create Service Account.
Enter the details of the service account for each item.
In the next section, you will create a role for Cloud Storage. Choose
Storage Admin (full permission).
Click the service account you created, click Add Key in the Keys
field, and select Create New Key.
Select JSON as the key type and "create" it. Since the JSON file is
downloaded in the local storage, use the JSON file in the next item
and operate Cloud Storage from Python.
We will install the libraries required for this project in Cloud
Shell First, install the Google Cloud Storage library with pip
install to access Cloud Storage:
pip install google-cloud-storage
Install openpyxl using :
pip install openpyxl
Create a folder (excel) with the name of your choice in your Cloud editor.
Create files within it :
main.py
JSON key file (the one that got downloaded in local storage, copy that
file into this folder)
excel
main.py
●●●●●●●●●●.json
Write the below lines of code in main.py file :
from google.cloud import storage
import openpyxl
import io
#Create a client instance for google cloud storage
client = storage.Client.from_service_account_json('●●●●●●●●●●.json') //The path to your JSON key file which is now
#Get an instance of a bucket
bucket = client.bucket(‘bucket_name’) //only the bucketname will do, full path not necessary.
##Get a blob instance of a file
blob = bucket.blob(‘test.xlsx') // test.xlsx is the excel file I uploaded in the bucket already.
buffer = io.BytesIO()
blob.download_to_file(buffer)
wb = openpyxl.load_workbook(buffer)
wb.save('./retest.xlsx')
You will see a file ‘retest.xlsx’ getting created at the same folder in Cloud Editor.

Related

Rename file name in GCP Cloud Storage and remove square brackets []

We have many videos uploaded to GCP Cloud Storage.
We need to change file name and remove [].
Asking if there is a good solution.
file example:
gs://xxxxxx/xxxxxx/[BlueLobster] Saint Seiya The Lost Canvas - 06 [1080p].mkv
You can't rename file in Cloud Storage. Renaming file equals to copying the file with a new name and to delete the older name.
It will take time if you have a lot of (large) files, but it's not impossible.
Based on the given scenario, you want to bulk rename all the filenames with ¨[]¨. Based on this documentation, gsutil interprets these characters as wildcards. gsutil does not support this currently.
There´s a way to handle this kind of request by using a custom script to rename all the files with ´[´.
You may use any programming languages that have Cloud Storage client libraries. For this instructions, we´ll be using Python for the custom script.
On your Google Cloud Console, Click the Activate Cloud Shell on the top right of the Google Cloud Console beside the question mark sign. For more information, you may refer here.
On your Cloud Shell, Install the Python client library by using this command:
pip install --upgrade google-cloud-storage
For more information, please refer on this documentation.
After the installation of client library, launch the Cloud Shell Editor by clicking the Open Editor on the top right side of the Cloud Shell. You may refer here for more information.
On your Cloud Shell Editor, click the File menu and choose New File. Name it script.py. Click Ok.
This code assumes that all the objects on your bucket have the same name from the sample you provided.:
import re
from google.cloud import storage
storage_client = storage.Client()
bucket_name = "my_bucket"
bucket = storage_client.bucket(bucket_name)
storage_client = storage.Client()
blobs = storage_client.list_blobs(bucket_name)
pattern = r"[\([{})\]]"
for blob in blobs:
out_var = blob.name
fixed_var = re.sub(pattern, '', blob.name)
print(out_var + " " + fixed_var)
new_blob = bucket.rename_blob(blob, fixed_var)
Change the content of ¨my_bucket¨ to the name of your bucket.
Click File and then Save or you can just press Ctrl + S.
Go back to the terminal by clicking the Open Terminal on the top right section of the Cloud Shell Editor.
Copy and paste this code to the editor:
python script.py
To run the script, press the Enter key.
Files that have brackets are now renamed.
The files aren´t renamed in the backend. Under the hood, it's more of being rewritten with a new name and it's due to object immutability. This will only copy the old files with a new name and removes the old file afterwards.

GCP AI Notebook can't access storage bucket

New to GCP. Trying to load a saved model file into an AI Platform notebook. Tried several approaches without success.
Most obvious approach seemed to be to set the value of a variable to the path copied from storage:
model_path = "gs://<my-bucket>/models/3B/export/1600635833/saved_model.pb"
Results: OSError: SavedModel file does not exist at: (the above path)
I know I can connect to the bucket and retrieve contents because I downloaded a csv file from the bucket and printed out the contents.
OSError to me sounds like you are trying to access GCS bucket with a regular file system which do not support looking at GCS. (Example: Python open() function)
To access files in GCS I recommend you use the Client Libraries. https://cloud.google.com/storage/docs/reference/libraries
Another option for testing is to try to connect to SSH and use gsutil command.
Note: I assume <my-bucket> was edited to replace your real GCS bucket name.
According to the GCP documentation enter here , you are able to access Cloud Storage. This page will guide to using Cloud Storage with AI Platform Training.

How to read csv file in Google Cloud Platform jupyter notebook

I am working on Jupyter notebook in google cloud platform AI notebook. Now I want to read .csv file in GCP which is stored locally in my laptop.
My approach:
df = pd.read_csv("C:\Users\Desktop\New Folder\Data.csv")
But its not working. How to read local file in GCP AI notebbok.
I don't think there is a direct way to do this, but here you have three alternatives:
a) Upload the file from the Jupyter UI:
1.Open the Jupyter UI.
2.In the left pane of the screen, at the top, below the menus, click the "Upload files" button.
3.Select the file from your local file system and click Open.
4.Once the file is available in the left pane of the screen, right-click the file and select "Copy Path".
5.In your Notebook, type the following code, replacing test.csv with the path you just copied:
import pandas as pd
df2 = pd.read_csv("test.csv")
print(df2)
b. Upload the file to the Notebooks instance's file system
1.Go to the Compute Engine screen in the GCP console.
2.SSH to your AI Platform Notebooks instance, using the SSH button.
3.In the new terminal window, click the gear icon and the "Upload File" option
4.Select the file from your local file system and click Open.
5.The file will be stored in $HOME/, optionally move it to the desired path.
6.In your Notebook, type the following code, replacing the path accordingly:
import pandas as pd
df = pd.read_csv("/path/to_file/test.csv")
print(df2)
c)Store the file in a GCS bucket.
1.Upload your file to GCS.
2.In your Notebook, type the following code, replacing the bucket and file names accordingly:
import pandas as pd
from google.cloud import storage
from io import BytesIO
client = storage.Client()
bucket_name = "your-bucket"
file_name = "your_file.csv"
bucket = client.get_bucket(bucket_name)
blob = bucket.get_blob(file_name)
content = blob.download_as_string()
df = pd.read_csv(BytesIO(content))
print(df)

How to upload dictionary files to google cloud postgresql?

I want to upload dictionary files.
CREATE TEXT SEARCH DICTIONARY ispell (
TEMPLATE = ispell,
DictFile = bulpo,
AffFile = bulpo
);
ERROR: could not open dictionary file "/share/tsearch_data/bulpo.dict": No such file or directory
I try to upload it via cloud shell but result is same.
hey #asapokL you cannot upload files Cloud SQL (rather than database file for migration purposes) as you don't have access to the underlying compute engine instance because Cloud SQL is fully managed solution by Google Cloud. in your case you will need to upload your file (.dict) to the directory tsearch_data directory is in SHAREDIR. I would suggest if you want to accomplish this use case is to create a compute engine then install posgresql on it. but remember you will have to handle all operations on this instance (backups, rollouts, performence, OS ...etc)

using file stored on aws s3 storage -- media storage as input of a process in django view

I am trying to make my django app in production in aws, i used elastic beanstalk to deploy it so the ec2 instance is created and connected to an rds database mysql instance and i use a bucket in amazon s3 storage to store my media files on it.
When user upload a video, it is stored in s3 as : "https://bucketname.s3.amazonaws.com/media/videos/videoname.mp4".
In django developpement mode, i was using the video filename as an input to a batch script which gives a video as output.
My view in developpement mode is as follow:
def get(request):
# get video
var = Video.objects.order_by('id').last()
v = '/home/myproject/media/videos/' + str(var)
# call process
subprocess.call("./step1.sh %s" % (str(v)), shell=True)
return render(request, 'endexecut.html')
In production mode in aws (Problem), i tried:
v = 'https://bucketname.s3.amazonaws.com/media/videos/' + str(var)
but the batch process doesn't accept the url as input to process.
How can i use my video file from s3 bucket to do process with in my view as i described ? Thank you in advance.
You should not hard-code that string. There are a couple of things wrong with that:
"bucketname" is not the name of your bucket. You should use the name of your bucket if this would at all work.
Your Media File URl (In settings.py) should be pointing to the bucket url where your files are saved (If it's well configured). So you can make use of:
video_path = settings.MEDIA_URL + video_name
I am assuming you are using s3boto to handle your storages (That's not a prerequisite though, it only makes your storage handling smarter and it's highly recommended if you are pushing to s3 from a django app)