Unable to cancel a dataflow on Google Cloud Platform - google-cloud-platform

I have a number of google cloud dataflows marked as "Running" in the Dataflow console, but there are no GCE instances running. I manually terminated the instances to avoid being billed. The dataflows seem to be permanently stuck in "running" state. If I try to cancel them from the console or gcloud utility, I receive a warning that the flow is already in "finishing state" so the request was ignored.
I am now at the running quota of 10, so I am stuck. Is there any solution to this other than creating a new project?

There was an issue in the Dataflow service that caused cancel requests to become stuck. It has since been resolved.

Related

How can I prevent Google Cloud Dataproc cluster VM instances from auto-shutoff?

When running vm instance cluster+ nodes even if I am using and running things on the cluster/ dataproc, the vm instance shuts off automatically after about 30 minutes or so. I cannot find this setting and would appreciate any help re: how to disable this to prevent it from shutting off or even how to configure a new cluster in a way that will prevent this from happening.
Thank you
Default Dataproc clusters do not have any kind of automatic shutdown.
If you are using the older Datalab initialization action, you are probably seeing Datalab's own non-Dataproc-aware shutdown functionality, which you can disable one of the ways suggested here: How to keep Google Dataproc master running?
Otherwise, if you're using some kind of template or copy/paste arguments for creating your Dataproc cluster, perhaps you're accidentally setting "scheduled deletion": https://cloud.google.com/dataproc/docs/concepts/configuring-clusters/scheduled-deletion
If neither of those settings explain your situation, you should visit your "activity logs" from the "Cloud Logging" interface, selecting Cloud Dataproc Cluster, and opening up the activity_log type of logs to see an audit log of who was deleting your cluster. Alternatively, if the cluster still existed in Dataproc, but the underlying VM was being shut down, visit the "Compute Engine VM" log category and also look at "activity logs" to see who was stopping your VMs. Sometimes, in a shared project, a project admin might be running some kind of script to automatically shut down VMs to save cost.

Shutting down VM instances created by dataflow

i am having issues with 3 VM instances created via dataflow
I have used a cloud function to launch a dataflow template, which ran to completion
but the VM instances generated for this are still running and i cannot delete them
could anyone help?
thanks and regards
so because i kicked off the template via cloud function, GCP didnt allow me to shut down the instance, the options were greyed out. HOwever it was saying that the instances were in use by few GCP groups, so once i deleted the group i was able to delete the instances
the problem seemed to come from my job, wher ei had a wait_until_finish() at the end of my pipeline,w hich was preventing the job from completing
Once i removed wait_until_finish, the job completed and the instances were shut down
thanks and regards
Marco

Stackdriver stopped logging from GKE Containers

Logs from Spring Boot applications deployed to GKE stopped showing up in Stackdriver Logging after February 2nd 2020. What happened around that time is that Stackdriver moved to a new UI, more integrated with the GCP console - could that have anything to do with it?
I do have other projects in GKE, such as a Node.js based backend, where logging to Stackdriver has continued without interruption, but there is just silence from the Spring Boot apps:
If I select "Kubernetes Container" instead of "GKE Container" in the GCP console at "Stackdriver Logging -> Logs Viewer" I do see some log statements, specifically errors like:
WARNING: You do not appear to have access to project [my-project] or it does not exist.
and
Error while fetching metric descriptors for kube-proxy: Get https://monitoring.googleapis.com/v3/projects/my-project/metricDescriptors?...
and
Error while sending request to Stackdriver Post https://monitoring.googleapis.com/v3/projects/my-project/timeSeries?...
OK, so that seems to start explaining the problem but I haven't been changing any IAM permissions, and when comparing those to the ones in the project hosting the Node.js GKE deployments which continue logging fine, they seem to be the same.
Should I be changing some permissions in the project hosting the Spring Boot GKE deployments, to get rid of those Stackdriver errors? What IAM member affects those? What roles would be required?
Turns out that the GKE cluster had Legacy Stackdriver Logging and Legacy Stackdriver Monitoring enabled:
and the problem was solved by setting those attributes to disabled and configuring the Stackdriver Kubernetes Engine Monitoring attribute:
But why the Stackdriver Logging continues uninterrupted for the Node.js applications, with the legacy options enabled, is still a mystery to me.

Debugging Google Cloud Dataflow VM Instances

I have a Google Cloud Dataflow streaming job that have a growing system lag. Lag started when I deployed a new changes. Lag is gradually growing without subsiding. I see frequent GCs happening from the stackdriver logs which indicate inefficiency/bug introduced by newly deployed changes. I like to further debug this, what is the best way to debug JVM on Dataflow instances?
I have tried enabling Monitoring agent when launching the job, which gives me GC count/time which is not that useful to debug the source of issue.

ECS service aws-cli vs. dashboard

Currently experiencing a weird behaviour of AWS ECS tool.
I find 2 different behaviours when using the aws-cli and the web dashboard.
The context is that I have an ECS cluster set up, I am writting a script that automates my deployment by (among other steps) creating or updating an ECS service.
Part of my script uses the command aws ecs describe-services
And it is here that I find different information than the dashboard (on the page of my cluster).
Indeed, when the service is created and ACTIVE if I run :
aws ecs describe-services --services my_service --cluster my_cluster
The service will show up as an output with all the informations that I need to parse. It will show up as well on the web dashboard as ACTIVE.
The problem is when I delete the service from the dashboard. As expected, it is deleted from the list and I can eventually recreate one from the dashboard with the same name.
But if when the service is deleted, I re-run the command above, the output will show the service as INACTIVE and all the infos about the previously deleted service will still appear.
If the service is deleted, shouldn't the command return the service as MISSING :
{
"services": [],
"failures": [
{
"reason": "MISSING",
"arn": "arn:aws:ecs:<my_regions>:<my_id>:service/my_service"
}
]
}
Because this complicates the parsing in my script, and even if I can find a workaround (maybe trying to create the service even if INACTIVE rather than not existing), it is kind of weird that even deleted, the service is still here, somewhere, clutering my stack.
Edit : I am using the latest versio of the aws-cli
This is the default behavior provided by aws. Please check below documentation:
When you delete a service, if there are still running tasks that require cleanup, the service status moves from ACTIVE to DRAINING , and the service is no longer visible in the console or in ListServices API operations. After the tasks have stopped, then the service status moves from DRAINING to INACTIVE . Services in the DRAINING or INACTIVE status can still be viewed with DescribeServices API operations. However, in the future, INACTIVE services may be cleaned up and purged from Amazon ECS record keeping, and DescribeServices API operations on those services return a ServiceNotFoundException error.
delete-service