How to check which user has stopped the dataflow pipeline in GCP? - google-cloud-platform

I have a Data flow pipeline running on GCP which reads the messages from pub/sub and writes to GCS bucket.My dataflow pipeline status is cancelled by some user and I want to know who that user is ?

You can view all Step Logs for a pipeline step in Stackdriver Logging by clicking the Stackdriver link on the right side of the logs pane.
Here is a summary of the different log types available for viewing from the Monitoring→Logs page:
job-message logs contain job-level messages that various components of Cloud Dataflow generate. Examples include the
autoscaling configuration, when workers start up or shut down,
progress on the job step, and job errors. Worker-level errors that
originate from crashing user code and that are present in worker logs
also propagate up to the job-message logs.
worker logs are produced by Cloud Dataflow workers. Workers do most of the pipeline work (for example, applying your ParDos to
data). Worker logs contain messages logged by your code and Cloud
Dataflow.
worker-startup logs are present on most Cloud Dataflow jobs and can capture messages related to the startup process. The startup
process includes downloading a job's jars from Cloud Storage, then
starting the workers. If there is a problem starting workers, these
logs are a good place to look.
shuffler logs contain messages from workers that consolidate the results of parallel pipeline operations.
docker and kubelet logs contain messages related to these public technologies, which are used on Cloud Dataflow workers.
As mentioned in previous comment you should filter by Pipeline ID, the actor of the task will be in the AuthenticationEmail entry.

Related

how to Make sure the dataflow logs are written to Storage bucket

how to Make sure the dataflow logs are written to Storage bucket in GCP
can please help anyone regarding it
The Dataflow logs are visibles in the job UI page showing the Dataflow DAG.
The logs are also written in Cloud Logging.
The Dataflow logs are not directly written in a file in Cloud Storage.
If you want to write logs to Cloud Storage file, you have to do that in a sink explicitly.
For example, you can write error logs in a file in Cloud Storage with a dead letter queue.
A dead letter queue is a way of treating messages in another queue in the job with side outputs.

GCP dataflow job to transfer data from pubsub (in one project) to biq query in another, implemented on terraform, doesn't read messages

I implement a dataflow job on terraform, using Google template Pubsub to Big Query. Pubsub is in one project, while dataflow and big query is in the other. The dataflow job is created, compute engine scales, subscriptions get created, service account has all possible permissions to run dataflow job and pubsub and service account user permissions in project where pubsub is. Pipeline API is enbled. Dataflow job is with the status running, big query tables are created, table scemas match the message schema. The only thing is that dataflow doesn't read messages from pubsub. The only thing is, maybe, when I open pipelines (within dataflow), I see nothing, also temp location specified in terraform code is not created. Service account has cloud storage admin permissions, so it's another indication that dataflow job (pipeline) just doesn't initiate the stream. Any suggestions? Maybe somebody had similar issue?
enter image description here
enter image description here

Confusion on AWS Cloudwatch and Application Logs

I have an on-premise app deployed in an Application Server (e.g. Tomcat) and it generates its own log file. If I decide to migrate this to an AWS EC2, including the Application Server, is it possible to port my application logs in Cloudwatch instead? or is Cloudwatch only capable of logging the runtime logs in my application server? is it a lot of work to do this or is this even possible?
Kind of confuse on Cloudwatch. Seems it can do multiple things but is it really right to make it do that? Its only supposed to log metrics right, so it can alert whatever or whoever needs to be alerted.
If you have already developed application that produces its own log files, you can use CloudWatch Logs Agent to ingest the logs into CloudWatch Logs:
After installation is complete, logs automatically flow from the instance to the log stream you create while installing the agent. The agent confirms that it has started and it stays running until you disable it.
The metrics, such as RAM usage, disk space, can also be monitored and pushed to CloudWatch through the agent.
In both cases, logs and metrics, you can setup CloudWatch Alarms to automatically detect anomalies and notify you, or perform other actions, when they are detected. For logs, this is done through metric filters:
You can search and filter the log data coming into CloudWatch Logs by creating one or more metric filters. Metric filters define the terms and patterns to look for in log data as it is sent to CloudWatch Logs. CloudWatch Logs uses these metric filters to turn log data into numerical CloudWatch metrics that you can graph or set an alarm on.
update
You can also have your application to inject logs directly to CloudWatch logs using AWS SDK. For example, in python, you can use put_log_events.

How to see why a long-running AWS Step Function failed

I have an AWS Step Function with many state transitions that can run for a half hour or more.
There are only a few states, and the application loops through them until it runs out of items to process.
I have a run that failed after about half an hour. I can look at the logging under the "Execution event history". However, since this logs every transition and state, there are thousands of events. I cannot page down to show enough events (clicking the "Load More" button) without hanging my browser window.
There is no way to sort or filter this list that I can see.
How can I find the cause of the failure? Is there a way to export the Execution event history somewhere? Or send it to CloudWatch?
You can use the AWS CLI command aws stepfunctions get-execution-history with the --reverse-order flag in order to get the logs from the most recent (where the errors will be) first.
How do you process your steps? Docker containers on ECS or Fargate? Give us some details on that.
Your tasks should be sending out logs to CloudWatch as they execute.
You can also look at the Docker logs themselves on the physical machine if your run docker on a machine you can SSH to.

How can we visualize the Dataproc job status in Google Cloud Plarform?

How can we visualize (via Dashboards) the Dataproc job status in Google Cloud Platform?
We want to check if jobs are running or not, in addition of their status like running, delay, blocked. On top of it we want to set alerting (Stackdriver Alerting) as well.
In this page, you have all the metrics available in Stackdriver
https://cloud.google.com/monitoring/api/metrics_gcp#gcp-dataproc
You could use cluster/job/submitted_count, cluster/job/failed_count and cluster/job/running_count to create the dashboard and metrics
Also, you could use cluster/job/completion_time to warn about long-running jobs and cluster/job/duration to check if jobs are enqueued in PENDING status for a long time.
cluster/job/completion_time is logged only after the job is completed. i.e. if the job takes 7 hours to complete, it is only registered at the 7th hour.
Similarly cluster/job/duration logs the time spent in each state only after the state is complete. Say if a job was in pending state for 1 hour, only at the 60th minute you would see this metric.
Dataproc has an open issue to introduce more metric that would help with this active alerting use case -> https://issuetracker.google.com/issues/211910984