GCP AI Notebook can't access storage bucket - google-cloud-platform

New to GCP. Trying to load a saved model file into an AI Platform notebook. Tried several approaches without success.
Most obvious approach seemed to be to set the value of a variable to the path copied from storage:
model_path = "gs://<my-bucket>/models/3B/export/1600635833/saved_model.pb"
Results: OSError: SavedModel file does not exist at: (the above path)
I know I can connect to the bucket and retrieve contents because I downloaded a csv file from the bucket and printed out the contents.

OSError to me sounds like you are trying to access GCS bucket with a regular file system which do not support looking at GCS. (Example: Python open() function)
To access files in GCS I recommend you use the Client Libraries. https://cloud.google.com/storage/docs/reference/libraries
Another option for testing is to try to connect to SSH and use gsutil command.
Note: I assume <my-bucket> was edited to replace your real GCS bucket name.

According to the GCP documentation enter here , you are able to access Cloud Storage. This page will guide to using Cloud Storage with AI Platform Training.

Related

Google CLoud Transfer Job is creating one extra folder

I have created a Transfer Job to import some of my website's static resources to Google storage.
The job was supposed to import the data in a bucket named www.pretty-story.com.
It is importing from a tsv file located here.
For instance the first url is :
https://www.pretty-story.com/wp-includes/js/jquery/jquery.min.js
so I would have expected the job to create the folder structure starting with wp-includes.
But instead the job created this folder structure www.pretty-story.com\wp-includes\js\jquery.
Therefore the complete path (including my bucket name) is :
www.pretty-story.com\www.pretty-story.com\wp-includes\js\jquery.
How can I tell the data transfer job to use the bucket as first folder, instead of creating a subfolder with the same name ?
According to https://cloud.google.com/storage-transfer/docs/create-url-list:
When an object located at http(s)://[HOSTNAME]:[PORT]/[URL_PATH] is transferred to Cloud Storage, the name of the object in Cloud Storage is [HOSTNAME]/[URL_PATH].
You don't have an option to skip the [HOSTNAME]/ part of this, so what you are asking is not possible.
If the amount of data involved is reasonable, I recommend downloading it to a workstation and using gsutil to copy it into a bucket without the hostname prefix.

Azure Data Factory HDFS dataset preview error

I'm trying to connect to the HDFS from the ADF. I created a folder and sample file (orc format) and put it in the newly created folder.
Then in ADF I created successfully linked service for HDFS using my Windows credentials (the same user which was used for creating sample file):
But when trying to browse the data through dataset:
I'm getting an error: The response content from the data store is not expected, and cannot be parsed.:
Is there something I'm doing wrongly or it is kind of permissions issue?
Please advise
This appears to be a generic issue, you need to point to a file with appropriate extension rather than a folder itself. Also make sure you are using a supported data store activity.
You can follow this official MS doc to use HDFS server with Azure Data Factory

Django open excel.xlsx with openpyxl from Google Cloud Storage

I need to open a .xlsx file from my bucket on Google Cloud Storage, the problem is I get :FileNotFoundError at /api/ficha-excel
[Errno 2] No such file or directory: 'ficha.xlsx'
These are the settings from my bucket.
UPLOAD_ROOT = 'reportes/'
MEDIA_ROOT = 'reportes'
These are the route bucket/reportes/ficha.xlsx
This is the code of my get function:
directorio = FileSystemStorage("/reportes").base_location
os.makedirs(directorio, exist_ok=True)
# read
print("Directorios: ", directorio)
plantilla_excel = openpyxl.load_workbook(f"{directorio}/ficha.xlsx")
print(plantilla_excel.sheetnames)
currentSheet = plantilla_excel['Hoja1']
print(currentSheet['A5'].value)
What is the problem with the path? I can't figure out.
The below solution doesn’t use Django FileStorage/Storage classes. It opens a .xlsx file from the Cloud Storage bucket on Google Storage using openpyxl.
Summary :
I uploaded the Excel file on GCS, read the Blob data with openpyxl via BytesIO and saved the data in the workbook using the .save() method.
Steps to Follow :
Create a Google Cloud Storage bucket. Choose a globally unique name for it. Keep with the defaults and finally enter Create.
Choose an Excel file from your local system and upload it in the bucket using the “Upload files” option.
Once you have the excel file in your bucket, follow the steps below :
Go to Google Cloud Platform and create a service account (API). Click
Navigation Menu> APIs & Services> Credentials to go to the screen.
Then click Manage Service Accounts.
On the next screen, click Create Service Account.
Enter the details of the service account for each item.
In the next section, you will create a role for Cloud Storage. Choose
Storage Admin (full permission).
Click the service account you created, click Add Key in the Keys
field, and select Create New Key.
Select JSON as the key type and "create" it. Since the JSON file is
downloaded in the local storage, use the JSON file in the next item
and operate Cloud Storage from Python.
We will install the libraries required for this project in Cloud
Shell First, install the Google Cloud Storage library with pip
install to access Cloud Storage:
pip install google-cloud-storage
Install openpyxl using :
pip install openpyxl
Create a folder (excel) with the name of your choice in your Cloud editor.
Create files within it :
main.py
JSON key file (the one that got downloaded in local storage, copy that
file into this folder)
excel
main.py
●●●●●●●●●●.json
Write the below lines of code in main.py file :
from google.cloud import storage
import openpyxl
import io
#Create a client instance for google cloud storage
client = storage.Client.from_service_account_json('●●●●●●●●●●.json') //The path to your JSON key file which is now
#Get an instance of a bucket
bucket = client.bucket(‘bucket_name’) //only the bucketname will do, full path not necessary.
##Get a blob instance of a file
blob = bucket.blob(‘test.xlsx') // test.xlsx is the excel file I uploaded in the bucket already.
buffer = io.BytesIO()
blob.download_to_file(buffer)
wb = openpyxl.load_workbook(buffer)
wb.save('./retest.xlsx')
You will see a file ‘retest.xlsx’ getting created at the same folder in Cloud Editor.

How to upload dictionary files to google cloud postgresql?

I want to upload dictionary files.
CREATE TEXT SEARCH DICTIONARY ispell (
TEMPLATE = ispell,
DictFile = bulpo,
AffFile = bulpo
);
ERROR: could not open dictionary file "/share/tsearch_data/bulpo.dict": No such file or directory
I try to upload it via cloud shell but result is same.
hey #asapokL you cannot upload files Cloud SQL (rather than database file for migration purposes) as you don't have access to the underlying compute engine instance because Cloud SQL is fully managed solution by Google Cloud. in your case you will need to upload your file (.dict) to the directory tsearch_data directory is in SHAREDIR. I would suggest if you want to accomplish this use case is to create a compute engine then install posgresql on it. but remember you will have to handle all operations on this instance (backups, rollouts, performence, OS ...etc)

Modifying image in Active Storage cloud

I'm using Rails 5.2 and GCS as cloud service.
I'd like to give an opportunity to users to crop and rotate user's image.
User has many Images, Image has one :image_file attached
In development I use such method:
class Image
...
def rotate(degree)
image = MiniMagick::Image.new(ActiveStorage::Blob.service.send(:path_for, self.image_file.key))
image.rotate "#{degree}"
image.write(ActiveStorage::Blob.service.send(:path_for, self.image_file.key))
self.image_file.blob.analyze
end
...
end
But I can't figure out how to get to image files in cloud.
I've made it to download the file to local storage and make all the operations needed.
Now it takes only to replace (delete current and create a new one with the same name) the file in the cloud (without changing anything in the database records if possible), but I can't figure out how to do this with active storage.
At least I need to get the file name in the cloud to use just bare google-cloud-ruby
To list files stored in Cloud Storage bucket using Ruby on Rails see the code example defined here. You can also upload files to cloud storage bucket and delete files from them using Ruby on Rails.
Also since you are allowing your customers to modify their files in Cloud Storage buckets, you may consider using versioning. This will incur you additional cost but will provide reliability for your customers.
Here is the link to Ruby on Google Cloud Platform documentation which might be helpful to you.