Check Dataflow errors - google-cloud-platform

I am trying to implement a data pipeline where I am trying insert a json in PubSub from there via DataFlow into BQ. I am using the template to transfer data from PubSub to BQ. My DataFlow is failing. It is going in the error flow. But I don't see where to get more details on the error. Eg, is it failing due to bad encoding of the data in pubsub, failing because of schema mismatch etc. etc.? Where can I find these details? I am checking Stackdriver error and logs but not able to locate where to find further details.
To add to that, I can see this error:
resource.type="dataflow_step"
resource.labels.job_id="2018-07-17_20_36_16-6729875790634111180"
logName="projects/camel-154800/logs/dataflow.googleapis.com%2Fworker"
timestamp >= "2018-07-18T03:36:17Z" severity>="INFO"
resource.labels.step_id=("WriteFailedRecords/FailedRecordToTableRow"
OR
"WriteFailedRecords/WriteFailedRecordsToBigQuery/PrepareWrite/ParDo(Anonymous)"
OR
"WriteFailedRecords/WriteFailedRecordsToBigQuery/StreamingInserts/CreateTables/ParDo(CreateTables)"
OR
"WriteFailedRecords/WriteFailedRecordsToBigQuery/StreamingInserts/StreamingWriteTables/ShardTableWrites"
OR
"WriteFailedRecords/WriteFailedRecordsToBigQuery/StreamingInserts/StreamingWriteTables/TagWithUniqueIds"
OR
"WriteFailedRecords/WriteFailedRecordsToBigQuery/StreamingInserts/StreamingWriteTables/Reshuffle/Window.Into()/Window.Assign"
OR
"WriteFailedRecords/WriteFailedRecordsToBigQuery/StreamingInserts/StreamingWriteTables/Reshuffle/GroupByKey"
OR
"WriteFailedRecords/WriteFailedRecordsToBigQuery/StreamingInserts/StreamingWriteTables/Reshuffle/ExpandIterable"
OR
"WriteFailedRecords/WriteFailedRecordsToBigQuery/StreamingInserts/StreamingWriteTables/GlobalWindow/Window.Assign"
OR
"WriteFailedRecords/WriteFailedRecordsToBigQuery/StreamingInserts/StreamingWriteTables/StreamingWrite")
It tells me it failed, but I have no clue why it failed? Was there schema mismatch or data type problems or wrong encoding or what? How to debug?

Related

AutoML training pipeline job failed. Where can I find the logs?

I am using Vertex AI's AutoML to train a model an it fails with the error message show below. Where can I find the logs for this job?
Training pipeline failed with error message: Job failed. See logs for details.
I had the same issue just now, raised a case with Google who told me how to find the error logs.
In GCP Log Explorer, you need a filter of resource.type = "ml_job" (make sure your time range is set correctly, too!)

BigQuery data transfer Job failed with error INTERNAL (Error: 80038528)

I have created a table in BQ and went ahead and made a cloud storage data trasfer job (ref - https://cloud.google.com/bigquery-transfer/docs/cloud-storage-transfer). But the job is throwing the following error.
Job bqts_6091b5c4-0000-2b62-bad0-089e08e4f7e1 (table abc) failed with error INTERNAL: An internal error occurred and the request could not be completed. Error: 80038528; JobID: 742067653276:bqts_6091b5c4-0000-2b62-bad0-089e08e4f7e1
The reason of the job failure with (Error: 80038528) is not enough slots being available. With respect to resource allocation, there is no fixed resource availability guarantee when using the on-demand model for running queries in BigQuery. The only way to make sure a certain number of slots is always available is by moving to a flat-rate model [1].
If slot is not an issue, then you can follow what #Sakshi Gatyan as mentioned. That is the right way to get an exact solution for BigQuery internal error.
[1]. https://cloud.google.com/bigquery/docs/slots
Since it's an internalError, there can be multiple reasons for your job failure depending upon your environment.
check error table
The job needs to be inspected internally by the Google support team. So I would recommend filling a case with support.

Dataflow Pipeline - “Processing stuck in step <STEP_NAME> for at least <TIME> without outputting or completing in state finish…”

Since I'm not allowed to ask my question in the same thread where another person have the same problem (but not using a template) I'm creating this new thread.
The problem: Im creating a dataflow job from a template in gcp to ingest data from pub/sub into BQ. This works fine until the job executes. The job gets "stuck" and does not write anything to BQ.
I cant do so much because I cant choose the beam version in the template. This is the error:
Processing stuck in step WriteSuccessfulRecords/StreamingInserts/StreamingWriteTables/StreamingWrite for at least 01h00m00s without outputting or completing in state finish
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:429)
at java.util.concurrent.FutureTask.get(FutureTask.java:191)
at org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl.insertAll(BigQueryServicesImpl.java:803)
at org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl.insertAll(BigQueryServicesImpl.java:867)
at org.apache.beam.sdk.io.gcp.bigquery.StreamingWriteFn.flushRows(StreamingWriteFn.java:140)
at org.apache.beam.sdk.io.gcp.bigquery.StreamingWriteFn.finishBundle(StreamingWriteFn.java:112)
at org.apache.beam.sdk.io.gcp.bigquery.StreamingWriteFn$DoFnInvoker.invokeFinishBundle(Unknown Source)
Any ideas how to get this to work?
The issue is coming from the step WriteSuccessfulRecords/StreamingInserts/StreamingWriteTables/StreamingWrite which suggest a problem while writing the data.
Your error can be replicated by (using either Pub/Sub Subscription to BigQuery or Pub/Sub Topic to BigQuery):
Configuring a template with a table that doesn't exist.
Starting the
template with a correct table and delete it during the job execution.
In both cases the stuckness message happens because the data is being read from Pubsub but it is waiting for the table availability to insert the data. The error is being reported each 5 minutes and it gets resolved when the table is created.
To verify the table configured in your template, see the property outputTableSpec in the PipelineOptions in the Dataflow UI.
I had the same issue before. The problem was that I used NestedValueProviders to evaluate the Pub/Sub topic/subscription and this is not supported in case of templated pipelines.
I was getting the same error and reason was that I created an empty BigQuery table without specifying an schema. Make sure to create a BQ table with a schema before you can load data via Dataflow.

An internal error occurred when attempting to deliver data in AWS Firehose data stream

I am implement AWS kinesis-Firehose data stream and facing issue in data delivery from s3 to redshift. can you please help me and let me know what is missing?
An internal error occurred when attempting to deliver data. Delivery
will be retried; if the error persists, it will be reported to AWS for
resolution. InternalError 2
It happened for me, and the problem was an error of inconsistency of the input record format the DB table.
Try to check AWS Docs of COPY command to make sure the COPY command parameters are defined properly.

PubSub resource setup failing for Dataflow job when assigning timestampLabel

After modifying my job to start using timestampLabel when reading from PubSub, resource setup seems to break every time I try to start the job with the following error:
(c8bce90672926e26): Workflow failed. Causes: (5743e5d17dd7bfb7): Step setup_resource_/subscriptions/project-name/subscription-name__streaming_dataflow_internal25: Set up of resource /subscriptions/project-name/subscription-name__streaming_dataflow_internal failed
where project-name and subscription-name represent the actual values of my project and PubSub subscription I'm trying to read from. Before trying to attach timestampLabel on message entry, the job was working correctly, consuming messages from the specified PubSub subscription, which should mean that my API/network settings are OK.
I'm also noticing two warnings with the payload
Internal Issue (119d3b54af281acf): 65177287:8503
but no more information can be found in the worker logs. For the few seconds that my job is setting up I can see the timestampLabel being set in the first step of the pipeline. Unfortunately I can't find any other cases or documentation about this error.
When using the timestampLabel feature, a second subscription is created for tracking purposes. Double check the permission settings on your topic to make sure it matches the permissions required.