Azure ML Online Endpoint deployment DriverFileNotFound Error - endpoint

When running the Azure ML Online endpoint commands, it works locally. But when I try to deploy it to Azure I get this error.
Command - az ml online-deployment create --name blue --endpoint "unique-name" -f endpoints/online/managed/sample/blue-deployment.yml --all-traffic
{
"status": "Failed",
"error": {
"code": "DriverFileNotFound",
"message": "Driver file with name score.py not found in provided dependencies. Please check the name of your file.",
"details": [
{
"code": "DriverFileNotFound",
"message": "Driver file with name score.py not found in provided dependencies. Please check the name of your file.\nThe build log is available in the workspace blob store \"coloraiamlsa\" under the path \"/azureml/ImageLogs/1673692e-e30b-4306-ab81-2eed9dfd4020/build.log\"",
"details": [],
"additionalInfo": []
}
],
This is the deployment YAML taken straight from azureml-examples repo
$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineDeployment.schema.json
name: blue
endpoint_name: my-endpoint
model:
local_path: ../../model-1/model/sklearn_regression_model.pkl
code_configuration:
code:
local_path: ../../model-1/onlinescoring/
scoring_script: score.py
environment:
conda_file: ../../model-1/environment/conda.yml
image: mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04:20210727.v1
instance_type: Standard_F2s_v2
instance_count: 1

Finally after lot of head banging, I have been able to consistently repro this bug in another Azure ML Workspace.
I tried deploying the same sample in a brand new Azure ML workspace created and it went smoothly.
At this point I remembered that I had upgraded the Storage Account of my previous AML Workspace to DataLake Gen2.
So I did the same upgrade in this new workspace’s storage account. After the upgrade, when I try to deploy the same endpoint, I get the same DriverFileNotFoundError!
It seems Azure ML does not support Storage Account with DataLake Gen2 capabilities although the support page says otherwise. https://learn.microsoft.com/en-us/azure/machine-learning/how-to-access-data#supported-data-storage-service-types.
At this point my only option is to recreate a new workspace and deploy my code there. Hope Azure team fixes this soon.

Related

Cloud Build Failed to trigger build: generic::permission_denied: Permission denied

I'm trying to use a use cloud build for my cloud run project. I have this cloudbuild.json:
{
"steps": [
{
"name": "gcr.io/cloud-builders/docker",
"args": ["build", "-t", "eu.gcr.io/$PROJECT_ID/keysafe", "."]
},
{
"name": "gcr.io/cloud-builders/docker",
"args": [
"push",
"us-central1-docker.pkg.dev/${PROJECT_ID}/my-docker-repo/myimage"
]
}
],
"options": {
"logging": "CLOUD_LOGGING_ONLY"
}
}
And I keep getting a permission denied error. I've tried running it without a service account and using my permissions (I'm the owner), and with a service account even with the owner role.
It was originally working but since my project transitioned from Container registry to Artifact repository, I was getting an error
generic::invalid_argument: generic::invalid_argument: if 'build.service_account' is specified, the build must either (a) specify 'build.logs_bucket' (b) use the CLOUD_LOGGING_ONLY logging option, or (c) use the NONE logging option
That error persisted through both my account and the service account, which is why I switched to building from a cloudbuild.json file, not just my Dockerfile alone.
All the other Stack Overflow articles I've found suggest permissions to assign, but the service account and I have owner permissions and even adding the suggested permissions on top of Owner did not help.
Here are the permissions of the service account:
Here is the trigger configuration:
If anyone ends up in my position this is how I fixed it.
I ended up deleting the Cloud Run and Build and then recreated them. This gave me a pre-made cloudbuild.yaml which I added the option logging: CLOUD_LOGGING_ONLY, still using the same service account. I'm not sure why this fixed it but it does seem to be working.

gcloud builds submit of Django website results in error "does not have storage.objects.get access"

I'm trying to deploy my Django website with Cloud Run, as described in Google Cloud Platform's documentation, but I get the error Error 403: 934957811880#cloudbuild.gserviceaccount.com does not have storage.objects.get access to the Google Cloud Storage object., forbidden when running the command gcloud builds submit --config cloudmigrate.yaml --substitutions _INSTANCE_NAME=trouwfeestwebsite-db,_REGION=europe-west6.
The full output of the command is: (the error is at the bottom)
Creating temporary tarball archive of 119 file(s) totalling 23.2 MiB before compression.
Some files were not included in the source upload.
Check the gcloud log [C:\Users\Sander\AppData\Roaming\gcloud\logs\2021.10.23\20.53.18.638301.log] t
o see which files and the contents of the
default gcloudignore file used (see `$ gcloud topic gcloudignore` to learn
more).
Uploading tarball of [.] to [gs://trouwfeestwebsite_cloudbuild/source/1635015198.74424-eca822c138ec
48878f292b9403f99e83.tgz]
ERROR: (gcloud.builds.submit) INVALID_ARGUMENT: could not resolve source: googleapi: Error 403: 934957811880#cloudbuild.gserviceaccount.com does not have storage.objects.get access to the Google Cloud Storage object., forbidden
On the level of my storage bucket, I granted 934957811880#cloudbuild.gserviceaccount.com the permission Storage Object Viewer, as I see on https://cloud.google.com/storage/docs/access-control/iam-roles that this covers storage.objects.get access.
I also tried by granting Storage Object Admin and Storage Admin.
I also added the "Viewer" role on IAM level (https://console.cloud.google.com/iam-admin/iam) for 934957811880#cloudbuild.gserviceaccount.com, as suggested in https://stackoverflow.com/a/68303613/5433896 and https://github.com/google-github-actions/setup-gcloud/issues/105, but it seems fishy to me to give the account such a broad role.
I enabled Cloud run in the Cloud Build permissons tab: https://console.cloud.google.com/cloud-build/settings/service-account?project=trouwfeestwebsite
With these changes, I still get the same error when running the gcloud builds submit command.
I don't understand what I could be doing wrong in terms of credentials/authentication (https://stackoverflow.com/a/68293734/5433896). I didn't change my google account password nor revoked permissions of that account to the Google Cloud SDK since I initialized that SDK.
Do you see what I'm missing?
The content of my cloudmigrate.yaml is:
steps:
- id: "build image"
name: "gcr.io/cloud-builders/docker"
args: ["build", "-t", "gcr.io/${PROJECT_ID}/${_SERVICE_NAME}", "."]
- id: "push image"
name: "gcr.io/cloud-builders/docker"
args: ["push", "gcr.io/${PROJECT_ID}/${_SERVICE_NAME}"]
- id: "apply migrations"
name: "gcr.io/google-appengine/exec-wrapper"
args:
[
"-i",
"gcr.io/$PROJECT_ID/${_SERVICE_NAME}",
"-s",
"${PROJECT_ID}:${_REGION}:${_INSTANCE_NAME}",
"-e",
"SETTINGS_NAME=${_SECRET_SETTINGS_NAME}",
"--",
"python",
"manage.py",
"migrate",
]
- id: "collect static"
name: "gcr.io/google-appengine/exec-wrapper"
args:
[
"-i",
"gcr.io/$PROJECT_ID/${_SERVICE_NAME}",
"-s",
"${PROJECT_ID}:${_REGION}:${_INSTANCE_NAME}",
"-e",
"SETTINGS_NAME=${_SECRET_SETTINGS_NAME}",
"--",
"python",
"manage.py",
"collectstatic",
"--verbosity",
"2",
"--no-input",
]
substitutions:
_INSTANCE_NAME: trouwfeestwebsite-db
_REGION: europe-west6
_SERVICE_NAME: invites-service
_SECRET_SETTINGS_NAME: django_settings
images:
- "gcr.io/${PROJECT_ID}/${_SERVICE_NAME}"
Thank you very much for any help.
The following solved my problem.
DazWilkin was right in saying:
it's incorrectly|unable to reference the bucket
(comment upvote for that, thanks!!). In my secret (configured on Secret Manager; or alternatively you can put this in a .env file at project root folder level and making sure you don't exclude that file for deployment in a .gcloudignore file then), I now
have set:
GS_BUCKET_NAME=trouwfeestwebsite_sasa-trouw-bucket (project ID + underscore + storage bucket ID)
instead of
GS_BUCKET_NAME=sasa-trouw-bucket
Whereas the tutorial in fact stated I had to set the first, I had set the latter since I found the underscore splitting weird, nowhere in the tutorial had I seen something similar, I thought it was an error in the tutorial.
Adapting the GS_BUCKET_NAME changed the error of gcloud builds submit to:
Creating temporary tarball archive of 412 file(s) totalling 41.6 MiB before compression.
Uploading tarball of [.] to [gs://trouwfeestwebsite_cloudbuild/source/1635063996.982304-d33fef2af77a4744a3bb45f02da8476b.tgz]
ERROR: (gcloud.builds.submit) PERMISSION_DENIED: service account "934957811880#cloudbuild.gserviceaccount.com" has insufficient permission to execute the build on project "trouwfeestwebsite"
That would mean that least now the bucket is found, only a permission is missing.
Edit (a few hours later): I noticed this GS_BUCKET_NAME=trouwfeestwebsite_sasa-trouw-bucket (project ID + underscore + storage bucket ID) setting then caused trouble in a later stage of the deployment, when deploying the static files (last step of the cloudmigrate.yaml). This seemed to work for both (notice that the project ID is no longer in the GS_BUCKET_NAME, but in its separate environment variable):
DATABASE_URL=postgres://myuser:mypassword#//cloudsql/mywebsite:europe-west6:mywebsite-db/mydb
GS_PROJECT_ID=trouwfeestwebsite
GS_BUCKET_NAME=sasa-trouw-bucket
SECRET_KEY=my123Very456Long789Secret0Key
Then, it seemed that there also really was a permissions problem:
for the sake of completeness, afterwards, I tried adding the permissions as stated in https://stackoverflow.com/a/55635575/5433896, but it didn't prevent the error I reported in my question.
This answer however helped me: https://stackoverflow.com/a/33923292/5433896. =>
Setting the Editor role on the cloudbuild service account helped the gcloud builds submit command to continue its process further without throwing the permissions error.
If you have the same problem: I think a few things mentioned in my question can also help you - for example I think doing this may also have been important:
I enabled Cloud run in the Cloud Build permissons tab:
https://console.cloud.google.com/cloud-build/settings/service-account?project=trouwfeestwebsite

Google Cloud Build error no project active

I am trying to set up Google Cloud Build with a really simple project hosted on firebase, but every time it reaches the deploy stage it tells me:
Error: No project active, but project aliases are available.
Step #2: Run firebase use <alias> with one of these options:
ERROR: build step 2 "gcr.io/host-test-xxxxx/firebase" failed: step exited with non-zero status: 1
I have set the alias to production and my .firebasesrc is:
{
"projects": {
"default": "host-test-xxxxx",
"production": "host-test-xxxxx"
}
I have firebase admin and API Keys Admin permissions on my cloud builder service and I also want to encrypt so I have Cloud KMS CryptoKey Decrypter
I do
firebase login:ci
to generate a token in my terminal and paste this to my .env variable, then generate an alias called production and do
firebase use production
My yaml is:
steps:
# Install
- name: 'gcr.io/cloud-builders/npm'
args: ['install']
# Build
- name: 'gcr.io/cloud-builders/npm'
args: ['run', 'build']
# Deploy
- name: 'gcr.io/host-test-xxxxx/firebase'
args: ['deploy']
and install and build work fine. What is happening here?
Rerunning firebase init does not seem to help.
Update:
building locally then doing firebase deploy does not help either.
Ok the thing that worked was changing the .firebasesrc file to:
{
"projects": {
"default": "host-test-xxxxx"
}
}
and
firebase use --add
and adding an alias called default.

How to solve insufficient authentication scopes when use Pubsub on GCP

I'm trying to build 2 microservices (in Java Spring Boot) to communicate with each other using GCP Pub/Sub.
First, I tested the programs(in Eclipse) working as epxected in my local laptop(http://localhost), i.e. one microservice published the message and the other received it successfully using the Topic/Subscriber created in GCP (as well as the credential private key: mypubsub.json).
Then, I deployed the same programs to run GCP, and got following errors:
- 2020-03-21 15:53:16.831 WARN 1 --- [bsub-publisher2] o.s.c.g.p.c.p.PubSubPublisherTemplate : Publishing to json-payload-sample-topic topic failed
- com.google.api.gax.rpc.PermissionDeniedException: io.grpc.StatusRuntimeException: PERMISSION_DENIED: Request had insufficient authentication scopes. at com.google.api.gax.rpc.ApiExceptionFactory
What I did to deploy the programs(in container) to run on GCP/Kubernetes Engine:
Login the Cloud Shell after switch to my project for the Pubsub testing
Git clone my programs which being tested in Eclipse
Move the mypubsub.json file to under /home/my_user_id
export GOOGLE_APPLICATION_CREDENTIALS="/home/my_user_id/mp6key.json"
Run 'mvn clean package' to build the miscroservice programs
Run 'docker build' to create the image files
Run 'docker push' to push the image files to gcr.io repo
Run 'kubectl create' to create the deployments and expose the services
Once the 2 microservices deployed and exposed, when I tried to access them in browser, the one to publish a message worked fine to retrieve data from database and processed the data, then failed with the above errors when trying to access the GCP Pubsub API to publish the message
Could anyone provide a hint for what to check to solve the issue?
The issue has been resolved by following the guide:
https://cloud.google.com/kubernetes-engine/docs/tutorials/authenticating-to-cloud-platform
Briefly the solution is to add following lines in the deployment.yaml to load the credential key:
- name: google-cloud-key
secret:
secretName: pubsub-key
containers:
- name: my_container
image: gcr.io/my_image_file
volumeMounts:
- name: google-cloud-key
mountPath: /var/secrets/google
env:
- name: GOOGLE_APPLICATION_CREDENTIALS
value: /var/secrets/google/key.json
Try to explicitly provide CredentialsProvider to your Publisher class, I faced the same authentication issue.
This approach worked for me !
CredentialsProvider credentialsProvider = FixedCredentialsProvider.create(
ServiceAccountCredentials.fromStream(
PubSubUtil.class.getClassLoader().getResourceAsStream("key.json")));
Publisher publisher = Publisher.newBuilder(topicName)
.setCredentialsProvider(credentialsProvider)
.build();

AWS Device Farm fail to upload apk

I'm having a problem using AWS Device Farm, but the problem is that Amazon is not very specific on what goes wrong.
After I created a new run and try to upload my apk file it shows this message before getting to finish the upload:
There was a problem uploading your file. Please try again.
There are no error codes. I have already tried several times using a signed app for debug and for release, but neither of them finishes the upload. Is this a temporal problem in Amazon cloud or it is a known error?
I work for the AWS Device Farm team.
Sorry to hear that you are running in to issues.
If it is the App that is giving you an error you should check if you are able to run the app locally on a real device. If yes, then this should be working on device farm. At times, app build for emulators/simulators are uploaded and can cause the error.
If it is the test apk that you are uploading then the same thing as point 1 should be confirmed.
If both of the points above are true and you are still getting an error please start a thread on AWS Device Farm forums and we can take a closer look at your runs or you can share your run url here and we can take a look.
Would it be possible to try and upload this file using the CLI[1]? The create-upload command will do the same thing the web console is doing and it can return more information than the web console.
aws devicefarm create-upload --project-arn <yourProjectsArn> --name <nameOfFile> --type <typeOfAppItIs> --region us-west-2
This will return a upload-arn which you will need to use later so keep it handy. If you need a more verbosity on any of the CLI commands listed here you can use the --debug option.
The create-upload command will return a presigned-url which you can do a PUT command on.
curl:
curl -T someAppFileWithSameNameAsSpecifiedBefore "presigned-url"
Once you have the file now uploaded you can do a get-upload command to see the status of the upload and if there are any problems this will show why.
aws devicefarm get-upload --arn <uploadArnReturnToYouFromPreviousCommand> --region us-west-2
My output looks like this:
{
"upload": {
"status": "SUCCEEDED",
"name": "app-debug.apk",
"created": 1500080938.105,
"type": "ANDROID_APP",
"arn": "arn:aws:devicefarm:us-west-2:<accountNum>:upload:<uploadArn>",
"metadata": "{\"device_admin\":false,\"activity_name\":\"com.xamarin.simplecreditcardvalidator.MainActivity\",\"version_name\":\"1.1\",\"screens\":[\"small\",\"normal\",\"large\",\"xlarge\"],\"error_type\":null,\"sdk_version\":\"21\",\"package_name\":\"com.xamarin.simplecreditcardvalidator\",\"version_code\":\"2\",\"native_code\":[],\"target_sdk_version\":\"25\"}"
}}
Please let me know what this returns and I look forward to your response.
Best Regards
James
[1] http://docs.aws.amazon.com/cli/latest/reference/devicefarm/create-upload.html
Also used this article to learn how to do most of this:
https://aws.amazon.com/blogs/mobile/get-started-with-the-aws-device-farm-cli-and-calabash-part-1-creating-a-device-farm-run-for-android-calabash-test-scripts/