Kubeflow pipeline fails in GCP - using cluster with Kubeflow pipeline integartion - google-cloud-platform

I am using kubeflow v2 to compile my script and uploading that yaml file to Kubeflow. The runs are not succeeding and giving me below error:
FileNotFoundError: [Errno 2] No such file or directory: '/gcs/my_bucket/tfx_taxi_simple/7e62cf81-31a1-42bd-b145-47c3d1f24758/pipeline/test-updated2/899ccb6a-0f39-4cc8-8448-ef8c614009cc/get-dataframe/df_path.csv'
F0202 13:03:30.504261 16 main.go:50] Failed to execute component: exit status 1
time="2023-02-02T13:03:30.507Z" level=error msg="cannot save artifact /tmp/outputs/test_df_path/data" argo=true error="stat /tmp/outputs/test_df_path/data: no such file or directory"
time="2023-02-02T13:03:30.507Z" level=error msg="cannot save artifact /tmp/outputs/train_df_path/data" argo=true error="stat /tmp/outputs/train_df_path/data: no such file or directory"
Error: exit status 1
Runtime execution graph. Only steps that are currently running or have already completed are shown.
Whereas when I am running the sample kubeflow pipelines it runs fine. Have observed that the yaml file I am uploading has a different template than the sample ones.
Not sure if it is a version issue or something else.
Can anyone help me resolve.
I have tried running the scripts on Colab and generating the yaml for both pipelines,they both generate different yaml.
I am expecting to run my Kubeflow pipeline which is using a kfp v2 compiler for generating the yaml .

Related

Why GCloud Builds submit failing after creating image?

I am learning deploying a pubsub service to run under Cloud Run, by following the guidelines given here
Steps I followed are:
Created a new project folder "myProject" in my local machine
Added below files:
app.jsindex.jsDockerfile
Executed below command to ship the code
gcloud builds submit --tag gcr.io/Project-ID/pubsub
It's mentioned in the tutorial document that
Upon success, you should see a SUCCESS message containing the ID, creation time, and image name. The image is stored in Container Registry and can be re-used if desired.
But in my case it's returning with error: (Ref: screenshot)
I have verified the build logs, "It's success"
So I thought to ignore this error and proceed with the next step to deploy the app by running the command:
gcloud run deploy sks-pubsub-cloudrun --image gcr.io/Project-ID/pubsub --no-allow-unauthenticated
When I run this command it immediately asking to specify the region (26 is my choice) from the list.
Next it fails with error:
Deploying container to Cloud Run service [sks-pubsub-cloudrun] in project [Project-ID] region [us-central1]
Deploying new service... Cloud Run error: The user-provided container failed to start and listen on the port defined provided by the PORT=8080 environment variable.
Logs for this revision might contain more information.
As I am new to this GCP & Dockerizing services, not understanding this issue and unable to fix it. I researched many blogs and articles yet no proper solution for this error.
Any help will be appreciated.
Tried to run the container locally and it's failing with error.
I'm using VS Code IDE, and "Cloud Code: Debug on Cloud Run Emulator" to debug the code.
Starting to debug the app using configuration `Cloud Run: Run/Debug Locally` from .vscode/launch.json
To view more detailed logs, go to Output channel : "Cloud Run: Run/Debug Locally - Detailed"
Dependency check started
Dependency check succeeded
Unpausing minikube
The minikube profile 'cloud-run-dev-internal' has been scheduled to stop automatically after exiting Cloud Code. To disable this on future deployments, set autoStop to false in your launch configuration d:\POC\promo_run_pubsub\.vscode\launch.json
Configuring minikube gcp-auth addon
Using GCP project 'Project-Id' with minikube gcp-auth
Failed to configure minikube gcp-auth addon. Your app might not be able to authenticate Google or GCP APIs it calls. The addon has been disabled. More details can be found in the detailed logs.
Update initiated
Deploy started
Deploy completed
Status check started
Resource pod/promo-run-pubsub-5d4cd64bf9-8pf4q status updated to In Progress
Resource deployment/promo-run-pubsub status updated to In Progress
Resource pod/promo-run-pubsub-5d4cd64bf9-8pf4q status updated to In Progress
Resource deployment/promo-run-pubsub status failed with waiting for rollout to finish: 0 of 1 updated replicas are available...
Status check failed
Update failed with error code STATUSCHECK_CONTAINER_TERMINATED
1/1 deployment(s) failed
Skaffold exited with code 1.
Cleaning up...
Finished clean up.

Unable to deploy Google Cloud Function due to mysterious --production=false flag

For some reason, I can no longer deploy existing google functions from my local machine or from github actions. Whenever I deploy using the gcloud functions deploy command, I get the following error in the console: ERROR: (gcloud.functions.deploy) OperationError: code=3, message=Build failed: Unknown Syntax Error: Invalid option name ("--production=false"). I am not using a --production=false option in my gcloud deploy command, so I don't really understand where that is coming from.
Build logs always failing on:
Step #1 - "build": Unable to delete previous cache image: DELETE https://us.gcr.io/v2/{{projectId}}/gcf/{{region}}/{{guid}}/cache/manifests/sha256:{{imageId}}: GOOGLE_MANIFEST_DANGLING_TAG: Manifest is still referenced by tag: latest.
Deploy command:
gcloud functions deploy --runtime=nodejs16 --region=us-central1 {{function_name}} --entry-point={{node_function}} --trigger-topic={{topic_name}}
Attempted with the following gcloud versions and got the same result each time:
370, 371, 369, 360
I am not sure where this is coming from. I did not have this problem when I deployed just yesterday and it is not specific to my local machine.
This was due to a regression issue on Google's part. They released a fix for it today and deploys are working again now.
Issue: https://github.com/GoogleCloudPlatform/buildpacks/issues/175#issuecomment-1030519240

Template launch failed while running a streaming dataflow job using flex-template

I'm trying to automate provisioning of streaming job using cloud build, for the POC I tried https://github.com/GoogleCloudPlatform/python-docs-samples/tree/master/dataflow/flex-templates/streaming_beam
It worked as expected when I manually ran the commands.
When I add the commands in cloudbuild.yaml file the build gets created successfully but the dataflow job fails each time with the below error:
Error occurred in the launcher container: Template launch failed. See console logs
This is the only error log that I get, I tried to add extra permissions to Cloud Build service account but that didn't help either.
Since there's no other info mentioned in the log file I find it hard to debug it as well.

AWS: ERROR: Pre-processing of application version xxx has failed and Some application versions failed to process. Unable to continue deployment

Hi I am trying to deploy a node application from cloud 9 to ELB but I keep getting the below error.
Starting environment deployment via CodeCommit
--- Waiting for Application Versions to be pre-processed --- ERROR: Pre-processing of application version app-491a-200623_151654 has
failed. ERROR: Some application versions failed to process. Unable to
continue deployment.
I have attached an image of the IAM roles that I have. Any solutions?
Go to your console and open up your elastic beanstalk console. Go to both applications and environments and delete them. Then in your terminal hit
eb init #Follow instructions
eb create --single ##Follow instructions.
It would fix the error, which is due to some application states which are failed. If you want to check those do
aws elasticbeanstalk describe-application-versions
I was searching for this answer as a result of watching a YouTube tutorial for how to pass the AWS Certified Developer Associate exam. If anyone else gets this error as a result of that tutorial, delete the 002_node_command.config file created in the tutorial and commit that change, as that is causing the error to occur.
A failure within the pre-processing phase, may be caused by an invalid manifest, configuration or .ebextensions file.
If you deploy an (invalid) application version using eb deploy and you enable the preprocess option, The details of the error will not be revealed.
You can remove the --process flag and enable the verbose option to improve error output.
in my case I deploy using this command:
eb deploy -l "XXX" -p
And can return a failure when I mess around with .ebextensions:
ERROR: Pre-processing of application version xxx has failed.
ERROR: Some application versions failed to process. Unable to continue deployment.
With that result I can't figure up what is wrong,
but deploying without -p (or --process)and adding -v (verbose) flag:
eb deploy -l "$deployname" -v
It returns something more useful:
Uploading: [##################################################] 100% Done...
INFO: Creating AppVersion xxx
ERROR: InvalidParameterValueError - The configuration file .ebextensions/16-my_custom_config_file.config in application version xxx contains invalid YAML or JSON.
YAML exception: Invalid Yaml: while scanning a simple key
in 'reader', line 6, column 1:
(... details of the error ...)
, JSON exception: Invalid JSON: Unexpected character (#) at position 0.. Update the configuration file.
Now I can fix the problem.

AWS Code Deploy Deployment Failed for shell scripts

Am trying to create CodeDeploy Deployment Group using the Cloud Formation Stack. Every time I run the stack, am getting script errors like Bad Interpreter, rm/ll command not found, /r /n errors. I tried to change the shell script files using dos2unix and zip those files and upload to CodeDeploy but no success.
Following is the error statement I get in logs:
2018-09-01 10:41:45 INFO [codedeploy-agent(2681)]: [Aws::CodeDeployCommand::Client 200 0.037239 0 retries] put_host_command_complete(command_status:"Failed",diagnostics:{format:"JSON",payload:"{\"error_code\":4,\"script_name\":\"BeforeInstall.sh\",\"message\":\"Script at specified location: BeforeInstall.sh run as user root failed with exit code 127\",\"log\":\"LifecycleEvent - BeforeInstall\\nScript - BeforeInstall.sh\\n[stderr]/usr/bin/env: bash\\r: No such file or directory\\n\"}"},host_command_identifier:"WyJjb20uYW1hem9uLmFwb2xsby5kZXBsb3ljb250cm9sLmRvbWFpbi5Ib3N0Q29tbWFuZElkZW50aWZpZXIiLHsiZGVwbG95bWVudElkIjoiQ29kZURlcGxveS91cy1lYXN0LTEvUHJvZC9hcm46YXdzOnNkczp1cy1lYXN0LTE6OTkzNzM1NTM2Nzc4OmRlcGxveW1lbnQvZC05V0kzWk5DNlYiLCJob3N0SWQiOiJhcm46YXdzOmVjMjp1cy1lYXN0LTE6OTkzNzM1NTM2Nzc4Omluc3RhbmNlL2ktMDk1NGJlNjk4OTMzMzY5MjgiLCJjb21tYW5kTmFtZSI6IkJlZm9yZUluc3RhbGwiLCJjb21tYW5kUG9zaXRpb24iOjMsImNvbW1hbmRBdHRlbXB0IjoxfV0=")
2018-09-01 10:41:45 ERROR [codedeploy-agent(2681)]: InstanceAgent::Plugins::CodeDeployPlugin::CommandPoller: Error during perform: InstanceAgent::Plugins::CodeDeployPlugin::ScriptError - Script at specified location: BeforeInstall.sh run as user root failed with exit code 127 - /opt/codedeploy-agent/lib/instance_agent/plugins/codedeploy/hook_executor.rb:173:in `execute_script'
......
......
2018-09-01 10:41:45 INFO [codedeploy-agent(2681)]: [Aws::CodeDeployCommand::Client 200 0.018288 0 retries] put_host_command_complete(command_status:"Failed",diagnostics:{format:"JSON",payload:"{\"error_code\":5,\"script_name\":\"\",\"message\":\"Script at specified location: BeforeInstall.sh run as user root failed with exit code 127\",\"log\":\"\"}"},host_command_identifier:"WyJjb20uYW1hem9uLmFwb2xsby5kZXBsb3ljb250cm9sLmRvbWFpbi5Ib3N0Q29tbWFuZElkZW50aWZpZXIiLHsiZGVwbG95bWVudElkIjoiQ29kZURlcGxveS91cy1lYXN0LTEvUHJvZC9hcm46YXdzOnNkczp1cy1lYXN0LTE6OTkzNzM1NTM2Nzc4OmRlcGxveW1lbnQvZC05V0kzWk5DNlYiLCJob3N0SWQiOiJhcm46YXdzOmVjMjp1cy1lYXN0LTE6OTkzNzM1NTM2Nzc4Omluc3RhbmNlL2ktMDk1NGJlNjk4OTMzMzY5MjgiLCJjb21tYW5kTmFtZSI6IkJlZm9yZUluc3RhbGwiLCJjb21tYW5kUG9zaXRpb24iOjMsImNvbW1hbmRBdHRlbXB0IjoxfV0=")
What can be the possible reason for failing?
The logs indicate that there is some problem with your scripts, specifically BeforeInstall.sh. Something in that script is failing with an exit code of 127. I would recommend adding logs to that script to see where it's actually failing. Once you identify the command that's failing, you can see what exit code 127 means for that particular command.
If you want help debugging that particular script, you should open up another question and provide the script, including the logs when it's gets run.
A note of CodeDeploy lifecycle hooks
In your case, your BeforeInstall script is failing, which will be the script that gets deployed with your application. However, if had been your ApplicationStop script that was failing, it's important to understand that ApplicationStop uses scripts from the last successful deployment, so if the last successful deployment had a fault script, it can cause future deployments to fail until these steps are followed.