Dataflow job can't be created for several hours

Dataflow job can't be created for several hours - google-cloud-platform

I submitted a job to Dataflow yesterday and today its status is still "Not started". But when I clicked into the job's title, it first game me a message "The graph is still being analysed" and it returned an error message to the top of the page that said "A job with ID "xxxxxx" doesn't exist".
What I want to do is to remove the job from the list. But it seems I can't perform any actions to the job.

Problem solved.
After I enabled the Dataflow API, the job could be successfully run.

Related

Getting an error in AWS Glue -> Jobs(New) Failed to update job [gluestudio-service.us-east-2.amazonaws.com] updateDag: InternalFailure: null

I've been using AWS Glue studio for Job creation. Till now I was using Job Legacy but recently Amazon has migrated to the new version Glue Job v_3.0 Where I am trying to create a job using Spark script editor.
Steps to be followed
Open Region-Code/console.aws.amazon.com/glue/home?region=Region-Code#/v2/home
Click Create Job link
Select Spark script editor
Make sure you selected the Create a new script with boilerplate code
Then click the Create button in the top right corner.
When I try to save the Job after fill all the required information, I'm getting an error like below
Failed to update job
[gluestudio-service.us-east-1.amazonaws.com] createJob: InternalServiceException: Failed to meet resource limits for operation
Screenshot
Note
I've tried the Legacy Job creation as well where I was getting an error like below
{"service":"AWSGlue","statusCode":400,"errorCode":"ResourceNumberLimitExceededException","requestId":"179c2de8-6920-4adf-8791-ece7cbbfbc63","errorMessage":"Failed to meet resource limits for operation","type":"AwsServiceError"}
Is this something related to Internal configuration issue?
As I was using client's provided account I don't have permission to see the Limitations and all

When can one find logs for Vertex AI Batch Prediction jobs?

I couldn't find relevant information in the Documentation. I have tried all options and links in the batch transform pages.

They can be found, but unfortunately not via any links in the Vertex AI console.
Soon after the batch prediction job fails, go to Logging -> Logs Explorer and create a query like this, replacing YOUR_PROJECT with the name of your gcp project:
logName:"projects/YOUR_PROJECT/logs/ml.googleapis.com"
First look for the same error reported by the Batch Prediction page in the Vertex AI console: "Job failed. See logs for full details."
The log line above the "Job Failed" error will likely report the real reason your batch prediction job failed.

I have found that just going to Cloud logger after batch prediction job fails and clicking run query shows the error details

Dataflow Pipeline - “Processing stuck in step <STEP_NAME> for at least <TIME> without outputting or completing in state finish…”

Since I'm not allowed to ask my question in the same thread where another person have the same problem (but not using a template) I'm creating this new thread.
The problem: Im creating a dataflow job from a template in gcp to ingest data from pub/sub into BQ. This works fine until the job executes. The job gets "stuck" and does not write anything to BQ.
I cant do so much because I cant choose the beam version in the template. This is the error:
Processing stuck in step WriteSuccessfulRecords/StreamingInserts/StreamingWriteTables/StreamingWrite for at least 01h00m00s without outputting or completing in state finish
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:429)
at java.util.concurrent.FutureTask.get(FutureTask.java:191)
at org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl.insertAll(BigQueryServicesImpl.java:803)
at org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl.insertAll(BigQueryServicesImpl.java:867)
at org.apache.beam.sdk.io.gcp.bigquery.StreamingWriteFn.flushRows(StreamingWriteFn.java:140)
at org.apache.beam.sdk.io.gcp.bigquery.StreamingWriteFn.finishBundle(StreamingWriteFn.java:112)
at org.apache.beam.sdk.io.gcp.bigquery.StreamingWriteFn$DoFnInvoker.invokeFinishBundle(Unknown Source)
Any ideas how to get this to work?

The issue is coming from the step WriteSuccessfulRecords/StreamingInserts/StreamingWriteTables/StreamingWrite which suggest a problem while writing the data.
Your error can be replicated by (using either Pub/Sub Subscription to BigQuery or Pub/Sub Topic to BigQuery):
Configuring a template with a table that doesn't exist.
Starting the
template with a correct table and delete it during the job execution.
In both cases the stuckness message happens because the data is being read from Pubsub but it is waiting for the table availability to insert the data. The error is being reported each 5 minutes and it gets resolved when the table is created.
To verify the table configured in your template, see the property outputTableSpec in the PipelineOptions in the Dataflow UI.

I had the same issue before. The problem was that I used NestedValueProviders to evaluate the Pub/Sub topic/subscription and this is not supported in case of templated pipelines.

I was getting the same error and reason was that I created an empty BigQuery table without specifying an schema. Make sure to create a BQ table with a schema before you can load data via Dataflow.

PubSub resource setup failing for Dataflow job when assigning timestampLabel

After modifying my job to start using timestampLabel when reading from PubSub, resource setup seems to break every time I try to start the job with the following error:
(c8bce90672926e26): Workflow failed. Causes: (5743e5d17dd7bfb7): Step setup_resource_/subscriptions/project-name/subscription-name__streaming_dataflow_internal25: Set up of resource /subscriptions/project-name/subscription-name__streaming_dataflow_internal failed
where project-name and subscription-name represent the actual values of my project and PubSub subscription I'm trying to read from. Before trying to attach timestampLabel on message entry, the job was working correctly, consuming messages from the specified PubSub subscription, which should mean that my API/network settings are OK.
I'm also noticing two warnings with the payload
Internal Issue (119d3b54af281acf): 65177287:8503
but no more information can be found in the worker logs. For the few seconds that my job is setting up I can see the timestampLabel being set in the first step of the pipeline. Unfortunately I can't find any other cases or documentation about this error.

When using the timestampLabel feature, a second subscription is created for tracking purposes. Double check the permission settings on your topic to make sure it matches the permissions required.

can i get real time status updates using the Graph Real Time API

I have an app that currently that i have to cron to run that gets a bunch of user status updates using https://graph.facebook.com/the-selected-user/statuses?access_token
and this works fine. I run a cron once an hour. I would like to use the Real-time Updates, but not sure how to do that. I have the example subscription working, but can not make the connection on how to subscribe to what to get the users status updates
Thanks for any help

Since you have subscriptions working (the hard part), just add a subscription for topic: user and field: feed.
From the API documentation:
way you subscribe to a user's feed - the subscription topic should be
'user' and the subscription field should be 'feed'

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Dataflow job can't be created for several hours - google-cloud-platform

Problem solved. After I enabled the Dataflow API, the job could be successfully run.

Related

Getting an error in AWS Glue -> Jobs(New) Failed to update job [gluestudio-service.us-east-2.amazonaws.com] updateDag: InternalFailure: null

When can one find logs for Vertex AI Batch Prediction jobs?

Dataflow Pipeline - “Processing stuck in step <STEP_NAME> for at least <TIME> without outputting or completing in state finish…”

PubSub resource setup failing for Dataflow job when assigning timestampLabel

can i get real time status updates using the Graph Real Time API

Categories

Resources