Template launch failed while running a streaming dataflow job using flex-template - google-cloud-platform

I'm trying to automate provisioning of streaming job using cloud build, for the POC I tried https://github.com/GoogleCloudPlatform/python-docs-samples/tree/master/dataflow/flex-templates/streaming_beam
It worked as expected when I manually ran the commands.
When I add the commands in cloudbuild.yaml file the build gets created successfully but the dataflow job fails each time with the below error:
Error occurred in the launcher container: Template launch failed. See console logs
This is the only error log that I get, I tried to add extra permissions to Cloud Build service account but that didn't help either.
Since there's no other info mentioned in the log file I find it hard to debug it as well.

Related

Why GCloud Builds submit failing after creating image?

I am learning deploying a pubsub service to run under Cloud Run, by following the guidelines given here
Steps I followed are:
Created a new project folder "myProject" in my local machine
Added below files:
app.jsindex.jsDockerfile
Executed below command to ship the code
gcloud builds submit --tag gcr.io/Project-ID/pubsub
It's mentioned in the tutorial document that
Upon success, you should see a SUCCESS message containing the ID, creation time, and image name. The image is stored in Container Registry and can be re-used if desired.
But in my case it's returning with error: (Ref: screenshot)
I have verified the build logs, "It's success"
So I thought to ignore this error and proceed with the next step to deploy the app by running the command:
gcloud run deploy sks-pubsub-cloudrun --image gcr.io/Project-ID/pubsub --no-allow-unauthenticated
When I run this command it immediately asking to specify the region (26 is my choice) from the list.
Next it fails with error:
Deploying container to Cloud Run service [sks-pubsub-cloudrun] in project [Project-ID] region [us-central1]
Deploying new service... Cloud Run error: The user-provided container failed to start and listen on the port defined provided by the PORT=8080 environment variable.
Logs for this revision might contain more information.
As I am new to this GCP & Dockerizing services, not understanding this issue and unable to fix it. I researched many blogs and articles yet no proper solution for this error.
Any help will be appreciated.
Tried to run the container locally and it's failing with error.
I'm using VS Code IDE, and "Cloud Code: Debug on Cloud Run Emulator" to debug the code.
Starting to debug the app using configuration `Cloud Run: Run/Debug Locally` from .vscode/launch.json
To view more detailed logs, go to Output channel : "Cloud Run: Run/Debug Locally - Detailed"
Dependency check started
Dependency check succeeded
Unpausing minikube
The minikube profile 'cloud-run-dev-internal' has been scheduled to stop automatically after exiting Cloud Code. To disable this on future deployments, set autoStop to false in your launch configuration d:\POC\promo_run_pubsub\.vscode\launch.json
Configuring minikube gcp-auth addon
Using GCP project 'Project-Id' with minikube gcp-auth
Failed to configure minikube gcp-auth addon. Your app might not be able to authenticate Google or GCP APIs it calls. The addon has been disabled. More details can be found in the detailed logs.
Update initiated
Deploy started
Deploy completed
Status check started
Resource pod/promo-run-pubsub-5d4cd64bf9-8pf4q status updated to In Progress
Resource deployment/promo-run-pubsub status updated to In Progress
Resource pod/promo-run-pubsub-5d4cd64bf9-8pf4q status updated to In Progress
Resource deployment/promo-run-pubsub status failed with waiting for rollout to finish: 0 of 1 updated replicas are available...
Status check failed
Update failed with error code STATUSCHECK_CONTAINER_TERMINATED
1/1 deployment(s) failed
Skaffold exited with code 1.
Cleaning up...
Finished clean up.

Dataflow Job failing

I have a pipeline which requires a Dataflow Job to run. I was using the gcloud CLI command to start a dataflow job which was working fine for over a month. But since last three days the dataflow job is failing within 10-20 sec with the following error log.
Failed to start the VM, launcher-2022012621245117717885921401920990, used for launching because of status code: UNAVAILABLE, reason: One or more operations had an error: 'operation-1643261093401-5d68989bed339-a33de830-9f90d92a': [UNAVAILABLE] 'HTTP_503'..
The command I'm using is:
gcloud dataflow sql query "SELECT tr.* FROM pubsub.topic.`my_project`.pubsub_topic as tr"
--job-name test_job
--region asia-south1
--bigquery-write-disposition write-empty
--bigquery-project my_project
--bigquery-dataset test_dataset --bigquery-table table_name
--max-workers 1 --worker-machine-type n1-standard-1
I tried starting the job from cloud console with same parameters as well which failed with the same error log. I have tested the job run from console before and it worked fine. The issue started a couple days ago.
What could be going wrong?
Thanks.
The Google Cloud error model indicates that a 503 means the service is unavailable [1].
You may try to change the region, for example, from europe-north1 to europe-west4, that should work. Additionally, you shouldn't include your job ID on Stack Overflow.
[1] https://cloud.google.com/apis/design/errors#handling_errors

How to increase the cloud build timeout when using ```gcloud run deploy```?

When attempting to deploy to Cloud Run using the gcloud run deploy I am hitting the 10m Cloud Build timeout limit. gcloud run deploy is working well as long as the build step does not exceed 10m. When the build step exceeds 10m, the build fails with the "Timed out" status as shown in below screenshot. AFAIK there are no arguments to gcloud run deploy that can set the Cloud Build timeout limit. gcloud run deploy docs are here: https://cloud.google.com/sdk/gcloud/reference/run/deploy
I've attempted to increase the Cloud Build timeout limit using gcloud config set builds/timeout 20m and gcloud config set container/build_timeout 20m, but these settings are not reflected in the execution details of the cloud build process when using gcloud run deploy.
In the GUI, this is the setting I want to change:
Is it possible to increase the Cloud Build timeout limit using gcloud run deploy?
How about splitting the command into (more easily configured) constituents?
[I've not tried this]
Build the container image specifying the timeout
:
gcloud builds submit --source=.... --timeout=...
Then reference the image that results when you gcloud run deploy:
gcloud run deploy ... --image=...
I know this is answered and confirmed, but #DazWikin's solution was the harder way to solve this problem than #SimonKarman's solution.
For those who do not have the cloudbuild.yml file like myself, this solution still is a valid one, you just need to edit the one created by google itself. You can find it under builds > triggers > Desired Trigger (Edit)
Then when you open the editor you can apply the timeout. If you want other changes to the yaml file you can also checkout the schema here:
https://cloud.google.com/build/docs/build-config-file-schema#yaml
Note: I am using cloudrun and this worked for me and therefore I am not 100% if it works with all builds generated by google
Hope it will be helpful for someone else in future :)
If you're using a --source such as the cloudbuild.yaml you can add the following property to alter the timeout in seconds:
...
timeout: "1800s"
...
You can find this in the documentation

AWS: ERROR: Pre-processing of application version xxx has failed and Some application versions failed to process. Unable to continue deployment

Hi I am trying to deploy a node application from cloud 9 to ELB but I keep getting the below error.
Starting environment deployment via CodeCommit
--- Waiting for Application Versions to be pre-processed --- ERROR: Pre-processing of application version app-491a-200623_151654 has
failed. ERROR: Some application versions failed to process. Unable to
continue deployment.
I have attached an image of the IAM roles that I have. Any solutions?
Go to your console and open up your elastic beanstalk console. Go to both applications and environments and delete them. Then in your terminal hit
eb init #Follow instructions
eb create --single ##Follow instructions.
It would fix the error, which is due to some application states which are failed. If you want to check those do
aws elasticbeanstalk describe-application-versions
I was searching for this answer as a result of watching a YouTube tutorial for how to pass the AWS Certified Developer Associate exam. If anyone else gets this error as a result of that tutorial, delete the 002_node_command.config file created in the tutorial and commit that change, as that is causing the error to occur.
A failure within the pre-processing phase, may be caused by an invalid manifest, configuration or .ebextensions file.
If you deploy an (invalid) application version using eb deploy and you enable the preprocess option, The details of the error will not be revealed.
You can remove the --process flag and enable the verbose option to improve error output.
in my case I deploy using this command:
eb deploy -l "XXX" -p
And can return a failure when I mess around with .ebextensions:
ERROR: Pre-processing of application version xxx has failed.
ERROR: Some application versions failed to process. Unable to continue deployment.
With that result I can't figure up what is wrong,
but deploying without -p (or --process)and adding -v (verbose) flag:
eb deploy -l "$deployname" -v
It returns something more useful:
Uploading: [##################################################] 100% Done...
INFO: Creating AppVersion xxx
ERROR: InvalidParameterValueError - The configuration file .ebextensions/16-my_custom_config_file.config in application version xxx contains invalid YAML or JSON.
YAML exception: Invalid Yaml: while scanning a simple key
in 'reader', line 6, column 1:
(... details of the error ...)
, JSON exception: Invalid JSON: Unexpected character (#) at position 0.. Update the configuration file.
Now I can fix the problem.

AWS Device Farm - Schedule Run - Errors

I am hoping someone here has come across this issue and has an answer for me.
I have setup a project in device farm and have written automation tests in Appium using JS.
When I create a run manually using the console the runs succeed without any issues and my tests get executed.
However when I try and schedule a run using the CLI using the following command it fails with an error
aws devicefarm schedule-run --project-arn projectArn --app-arn appArn --device-pool-arn dpARN --name myTestRun --test type=APPIUM_NODE,testPackageArn="testPkgArn"
Error : An error occurred (ArgumentException) when calling the ScheduleRun operation: Standard Test environment is not supported for testType: APPIUM_NODE
Cli Versions : aws-cli/1.17.0 Python/3.8.1 Darwin/19.2.0 botocore/1.14.0
That is expected currently for the standard environment. The command will need to use the custom environment which the cli can do by setting the testSpecArn value.
This arn is an upload in device farm consisting of a .yaml file which defines how the tests are executed.
This process is discussed here
https://docs.aws.amazon.com/devicefarm/latest/developerguide/how-to-create-test-run.html#how-to-create-test-run-cli-step6
The error in this case is caused by the fact that the APPIUM_NODE test type can only be used with the custom environment currently.