issue during deployement of model gcloud ml-engine versions create - google-cloud-platform

When I create a version of a machine learning model (whether it is my own model or the ML Engine census example) using the command:
$ gcloud ml-engine versions create v1 --model $MODEL_NAME --origin $MODEL_BINARIES --runtime-version 1.10
I got an error saying: ERROR: (gcloud.ml-engine.versions.create) FAILED_PRECONDITION: Framework can not be identified from model path. Please make sure your model file name is correct.

Got the same problem,JOB_ID was empty in my case, fixed by adding
JOB_ID=census_211004_181920
before OUTPUT_PATH declaration. You can check your JOB_ID in Storage Browser.

Make sure that MODEL_BINARIES is a folder that contains the saved_model.pb file.
When I followed the google documentation,
gsutil cp -r SavedModel/saved_model ${YOUR_GCS_BUCKET}/model_dir_tmp/
it just copied the file saved_model.pb into ${YOUR_GCS_BUCKET}/model_dir_tmp, instead of creating ${YOUR_GCS_BUCKET}/model_dir_tmp/saved_model.
Later, when I passed in ${YOUR_GCS_BUCKET}/model_dir_tmp/saved_model to --origin, I received the complaint about Framework can not be identified from model path.
I manually went to the cloud console webpage, and created a folder saved_model and moved the file saved_model.pb into it.

Related

How to copy artifacts between projects in Google Artifact Registry

Previously using Container Registry one could copy a container between projects using this method
However I am unable to get this working using Artifact Registry. If I try
gcloud artifacts docker tags add \
us-east4-docker.pkg.dev/source-proj/my-repo/my-image:latest \
us-east4-docker.pkg.dev/dest-proj/my-repo/my-image:latest
It gives the error
ERROR: (gcloud.artifacts.docker.tags.add) Image us-east4-docker.pkg.dev/source-proj/my-repo/my-image
does not match image us-east4-docker.pkg.dev/dest-proj/my-repo/my-image
I have searched and can not find any examples or documentation on how to do this.
You can use the gcrane tool to copy images in Artifact Registry.
For example, the following command copies image my-image:latest from the repository my-repo in the project source-proj to the repository my-repo in another project called dest-proj.
gcrane cp \
us-east4-docker.pkg.dev/source-proj/my-repo/my-image:latest \
us-east4-docker.pkg.dev/dest-proj/my-repo/my-image:latest
Here is the link to the Google Cloud official documentation.
This can be done using
gcrane cp us-east4-docker.pkg.dev/source-proj/my-repo/my-image:latest us-east4-docker.pkg.dev/dest-proj/my-repo/my-image:latest

(gcloud.compute.images.create) Could not fetch resource: Invalid value for field 'resource.rawDisk.source'

I'm trying to create a custom image for Google Compute Engine by using a file from Cloud Storage with the following command:
gcloud compute images create my-custom-image-name --source-uri gs://my-storage-bucket-name/gce-demo-tar.gz
Output:
ERROR: (gcloud.compute.images.create) Could not fetch resource:
- Invalid value for field 'resource.rawDisk.source': 'https://storage.googleapis.com/storage/v1/b/my-storage-bucket-name/o/gce-demo-tar.gz'.
The provided source is not a supported file.
The file in question is from a virtual machine exported in RAW format using the following command:
VBoxManage clonehd -format RAW ~/VirtualBox\ VMs/SLES12sp5/SLES12sp5.qcow ~/disk.raw
Then archived with the following command:
gtar -cSzf gce-demo-tar.gz disk.raw
However, I'm not sure if the problem is related to the file itself as I have exactly the same error if I try to import an OVA file or it may be related to storage permissions or configuration?
Thank you!
In the file path when specifying your --source-uri flag, try gs://my-storage-bucket-name/gce-demo.tar.gz and make sure the file is uploaded with the same name.
The error might be occurring because of the file extension you tried to use, which is .gz and it should be .tar.gz instead.

Cloud Machine Learning Engine fails to deploy model

I have trained both my own model and the one from the official tutorial.
I'm up to the step to deploy the model to support prediction. However, it keeps giving me an error saying:
"create version failed. internal error happened"
when I attempt to deploy the models by running:
gcloud ml-engine versions create v1 \
--model $MODEL_NAME \
--origin $MODEL_BINARIES \
--python-version 3.5 \
--runtime-version 1.13
*the model binary should be correct, as I pointed it to the folder containing model.pb and variables folder, e.g. MODEL_BINARIES=gs://$BUCKET_NAME/results/20190404_020134/saved_model/1554343466.
I have also tried to change the region setting for the model as well, but this doesn't help.
Turns out your GCS bucket and the trained model needs to be in the same region. This was not explained well in the Cloud ML tutorial, where it only says:
Note: Use the same region where you plan on running Cloud ML Engine jobs. The example uses us-central1 because that is the region used in the getting-started instructions.
Also note that a lot of regions cannot be used for both the bucket and model training (e.g. asia-east1).

gcloud job can't access my files, either they are in GCS or in my cloud shell

I'm trying to run my code of machine learning from images using tensorflow in Google CloudML. However, it seems the submitted job can't access to my files in my cloud shell or in GCS. Even though it is working fine in my local machine, I get the following error once I submit my job using the command gcloud from the cloud shell:
ERROR 2017-12-19 13:52:28 +0100 service IOError: [Errno 2] No such file or directory: '/home/user/pores-project-googleML/trainer/train.txt'
This folder can be found for sure in cloud shell, and I can check it when I type:
ls /home/user/pores-project-googleML/trainer/train.txt
I tried putting my file train.txt in GCS and access to it from my code (by specifying the path gs://my_bucket/my_path), but once the job submitted, I got a 'No such file or directory' error with the corresponding path.
To check where the job I submitted using gcloud is running, I added print(os.getcwd()) in the beginning of my python code trainer/task.py, which printed as a result in the logs: /user_dir. I couldn't find this path using the cloud shell, not even in GCS. So my question is how can I know in which machine my job is running? If it's in a certain container somewhere, how can I access from it to my files using the cloud shell and in GCS?
Before I do all of this, I succesfully completed the 'Image Classification using Flowers Dataset' tutorial.
The command I used to submit my job is:
gcloud ml-engine jobs submit training $JOB_NAME --job-dir $JOB_DIR --packages trainer-0.1.tar.gz --module-name $MAIN_TRAINER_MODULE --region us-central1
where:
TRAINER_PACKAGE_PATH=/home/use/pores-project-googleML/trainer
MAIN_TRAINER_MODULE="trainer.task"
JOB_DIR="gs://pores/AlexNet_CloudML/job_dir/"
JOB_NAME="census$(date +"%Y%m%d_%H%M%S")"
Regular Python IO library is not able to access files on GCS. Instead, you need to use GCS python client or gstuil cli to access GCS files.
Note that TensorFlow itself has native support of GCS (i.e., it can read GCS files directly).

GCloud Error: Source code size exceeds the limit

I'm doing the tutorial of basic fulfillment and conversation setup of api.ai tutorial to make a chat bot, and when I try to deploy the function with the command:
gcloud beta functions deploy --stage-bucket venky-bb7c4.appspot.com --trigger-http
(where 'venky-bb7c4.appspot.com' is the bucket_name)
It return the following error message:
ERROR: (gcloud.beta.functions.deploy) OperationError: code=3, message=Source code size exceeds the limit
I've searched but not found any answer, I don't know where is the error.
this is the JS file that appear in the tutorial:
/
HTTP Cloud Function.
#param {Object} req Cloud Function request context.
#param {Object} res Cloud Function response context.
*/
exports.helloHttp = function helloHttp (req, res) {
response = "This is a sample response from your webhook!" //Default response from the webhook to show it's working
res.setHeader('Content-Type', 'application/json'); //Requires application/json MIME type
res.send(JSON.stringify({ "speech": response, "displayText": response
//"speech" is the spoken version of the response, "displayText" is the visual version
}));
};
Neither of these worked for me. The way I was able to fix this was to make sure I was running the deploy from my project directory (the directory containing index.js)
The command creates zip with whole content of your current directory (except node_modules subdirectory) not just the JS file (this is because your function may use other resources).
The error you see is because size of (uncompressed) files in the directory is bigger than 512MB.
The easiest way to solve this is by moving the .js file to its own directory and deploying from there (you can use --local-path to point to directory containing the source file if you want your working directory to be different from directory with function source).
I tried with source option or deploying from the index.js folder and still a different problem exists.
This error usually happens if the code that is being uploaded is large. In my tests I found more than 100MB lead to the mentioned error.
However,
To resolve this there are two solutions.
Update .gcloudignore to ignore the folders which aren't required for your function
Still if option 1 doesn't resolve, you need to create a bucket in storage and mention it with --stage-bucket option.
Create a new bucket for deployment (one time)
gsutil mb my-cloud-functions-deployment-bucket
The bucket you created needs to be unique else it throws already created
Deploy
gcloud functions deploy subscribers-firestoreDatabaseChange
--trigger-topic firestore-database-change
--region us-central1
--runtime nodejs10
--update-env-vars "REDIS_HOST=10.128.0.2"
--stage-bucket my-cloud-functions-deployment-bucket
I had similar problems while deploying cloud functions. What is working for me was specifying the js files source folder.
gcloud functions deploy functionName --trigger-http **--source path_to_project_root_folder**
Also be sure to include all unnecessary folders in .gcloudignore.
Ensure the package folder has a .gitignore file (excluding node_modules).
The most recent version of gcloud requires it in order to not load node_modules. My code size went from 119MB to 17Kb.
Once I've added the .gitignore file, the log printed as well
created .gcloudignore file. See `gcloud topic gcloudignore` for details.