Ethtx tasks can be used with webhook jobs - blockchain

I would like to know if ethtx tasks can be used with webhook jobs. I tried to run the job from the operator UI and it returns an internal server error, when I look up the logs it says that
Expected at least one task to be final pipeline/common.go:212 logger=1.2.1#168d34a stacktrace=github.com/smartcontractkit/chainlink/core/services/pipeline.TaskRunResults.FinalResult
/chainlink/core/services/pipeline/common.go:212
which I assume it means it cannot work unless ethtx is a final task such as jsonparse/multiply etc

Webhook jobs are limited in the task flow I believe, such that there is no support for webhook jobs and on-chain tx's related to that webhook job. In general, node operators utilize webhook jobs, to verify there job-spec code within the "observationSource" such that a bridge api endpoint is properly connected, and filtering to the correct path that the downstream user wants. Because there is no on-chain tx recorded, the testing can be done immediately after job-creation and no test smart contract to ping the oracle/node.
https://docs.chain.link/docs/jobs/types/webhook/

Related

How to complete a service task using camunda rest api

I am using Camunda workflows to automate various processes. I have come across a scenario where the process is not moving from a service task. Usually, we call the task/{taskid}/complete to complete the task, but since the process is stuck on a service task, I am not able to complete that task. Can anybody help me find a way to complete the service task?
You are using a service task. That basically means "a machine should do something". The "normal" implementation is to provide code (a java Delegate or a connector endpoint) that is called by the process engine to execute this task.
The alternativ is to use the "external task" pattern. Think of external tasks as "user tasks for computers". So the process waits, tells subscribed clients that a job is to be done and waits for their completion.
I suppose your process uses the second option? (you can check in the modeler under "Implementation"). So completion can be done through the external task API, see docs.
/external-task/{id}/complete
If it is a connector then you likely will see when checking the log that retries have occurred and that the transaction rolled back. After addressing the underlying issue the service task (email) should be sent without explicitly triggering the service task and the following user task (Approval) should be created.

Is there a way to be notified of status changes in Google AI Platform training jobs without polling the REST API?

Right now I monitor my submitted jobs on Google AI Platform (formerly ml engine) by polling the job REST API. I don't like this solution for a few reasons:
Awareness of status changes is often delayed or missed altogether if the interval between status changes is smaller than the monitoring polling rate
Lots of unnecessary network traffic
Lots of unnecessary function invocations
I would like to be notified as soon as my training jobs complete. It'd be great if there is some way to assign hooks or callbacks to run when the job status changes.
I've also considered adding calls to cloud functions directly within the training task python package that runs on AI Platform. However, I don't think those function calls will occur in cases where the training job is shutdown unexpectedly, such as when a job is cancelled or forced to end by GCP.
Is there a better way to go about this?
You can use a Stackdriver sink to read the logs and send it to Pub/Sub. From Pub/Sub, you can connect to a bunch of other providers:
1. Set up a Pub/Sub sink
Make sure you have access to the logs and publish rights to the topic you desire before you get started. Follow the instructions for setting up a Stackdriver -> Pub/Sub sink. You’ll want to use this query to limit the events only to Training jobs:
resource.type = "ml_job"
resource.labels.task_name = "service"
Note that Stackdriver can further limit down the query. For example, you can limit to a particular Job by adding a condition like resource.labels.job_id = "..." or to a certain event with a filter like jsonPayload.message : "..."
2. Respond to the Pub/Sub message
In order to tell what changed, the recipient of the Pub/Sub message can either query the job status from the ml.googleapis.com API or read the text of the message
Reading state from ml.googleapis.com
When you receive the message, make a call to https://ml.googleapis.com/v1/<project_id>/jobs/<job_id> to get the Job information, replacing [project_id] and [job_id] in the URL with the values of resource.label.project_id and resource.label.job_id from the Pub/Sub message, respectively.
The returned Job object contains a field state that, naturally, tells the status of the job.
Reading state from the message text
The Pub/Sub message will contain a string telling what happened to the job. You probably want behavior when the job ends. Look for these strings in jsonPayload.message:
"Job completed successfully."
"Job cancelled."
"Job failed."
I implemented a Terraform module as #htappen said. I'm happy if it would help you. But my real hope is that Google updates AI Platform with the same feature.
https://github.com/sfujiwara/terraform-google-ai-platform-notification
I think you can programmatically publish a PubSub message at the end of your training job code. Something like this:
from google.cloud import pubsub_v1
# publish job complete message
client = pubsub_v1.PublisherClient()
topic = client.topic_path(args.gcp_project_id, 'topic-name')
data = {
'ACTION': 'JOB_COMPLETE',
'SAVED_MODEL_DIR': args.job_dir
}
data_bytes = json.dumps(data).encode('utf-8')
client.publish(topic, data_bytes)
Then you can setup a cloud function to be triggered by the same pubsub topic.
You can work around the lack of a callback from the service on a custom TF training job by adding a LamdbaCallback to the fit() call. In the on_epoch method, you could then send yourself a notification on job progress and on_train_end when it finishes.
https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/LambdaCallback

How to use Google Cloud PubSub and Run to handle resource-intensive long-running tasks?

I've got a Google Cloud PubSub topic which at times has thousands of messages and at times zero messages coming in. These messages represent tasks which can take upwards of an hour each. Preferably I'm able to use Cloud Run for this, as it scales really well to the demand, if a thousand messages gets published, I want 100s of Cloud Run instances to spin up. These Run instances get started by a push subscription. The problem is that PubSub has a 600 second timeout for the acknowledgement. This means in order to have Cloud Run process these messages they have to finish within 600 seconds. If they do not, PubSub times it out, and sends it again, causing the task to be restarted until the first task finally does acknowledge it (this causes the same task to be ran many times). Cloud Run acknowledges the messages by returning a 2** HTTP status code. The documentation states
When an application running on Cloud Run finishes handling a request, the container instance's access to CPU will be disabled or severely limited. Therefore, you should not start background threads or routines that run outside the scope of the request handlers.
So is it maybe possible to acknowledge a PubSub request through code and continue the processing, without having Google Cloud Run hand over the resources? Or is there a better solution I'm unaware of?
Because these processes are so code/resource-intensive, I feel Cloud Functions will not suffice. I've looked at https://cloud.google.com/solutions/using-cloud-pub-sub-long-running-tasks and https://cloud.google.com/blog/products/gcp/how-google-cloud-pubsub-supports-long-running-workloads. But these didn't answer my question.
I've looked at Google Cloud Tasks, which might be something? But the rest of the project has been built around PubSub/Run/Functions, so preferably I stick with that.
This project is written in Python.
So preferably I would like to write my Google Cloud Run tasks like this:
#app.route('/', methods=['POST'])
def index():
"""Endpoint for Google Cloud PubSub messages"""
pubsub_message = request.get_json()
logger.info(f'Received PubSub pubsub_message {pubsub_message}')
if message_incorrect(pubsub_message):
return "Invalid request", 400 #use normal NACK handling
# acknowledge message here without returning
# ...
# Do actual processing of the task here
# ...
So how can or should I solve this, so that the the resource-intensive tasks get properly scaled on demand ( so a push PubSub subscription ). And the tasks only get executed once.
Answers:
In short what has been answered. Cloud Run and Functions are just not suited for this problem. There is no way to have them do tasks that take longer than 9 or 15 minutes respectively. The only solution is to switch over to another Google Service and use a pull style subscription and lose out on auto-scaling of GC Run/Functions
Cloud Run on GKE can handle long process, more CPU and memory than available on managed platform. However, you have a GKE cluster always running and you loose the "pay-as-you-use" benefit.
If you want to use this solution, don't link directly PubSub push subscription to your Cloud Run on GKE. Use Cloud Task with HTTP job for this. The timeout is longer than PubSub (up to 24h instead of 10 min) and the retry policies are customizables.
Neither Cloud Functions nor Cloud Run is sufficient for arbitrarily long running operations. Cloud Functions has a hard cap of 9 minutes per invocation, and Cloud Run caps at 60. If you need more time, you're going to have to delegate the work to another product, such as Google Compute Engine. It should be possible to kick off some Compute Engine work from one of the serverless products.
Give the limits of pubsub acks, you'll probably have to find a way for a client to be able to poll or listen to some resource to find out when the work is actually done. You could use a database for that, and Cloud Firestore lets you listen to documents to find out when they change. So you could use that to track the status of your long-running work.

How to stream events with GCP platform?

I am looking into building a simple solution where producer services push events to a message queue and then have a streaming service make those available through gRPC streaming API.
Cloud Pub/Sub seems well suited for the job however scaling the streaming service means that each copy of that service would need to create its own subscription and delete it before scaling down and that seems unnecessarily complicated and not what the platform was intended for.
On the other hand Kafka seems to work well for something like this but I'd like to avoid having to manage the underlying platform itself and instead leverage the cloud infrastructure.
I should also mention that the reason for having a streaming API is to allow for streaming towards a frontend (who may not have access to the underlying infrastructure)
Is there a better way to go about doing something like this with the GCP platform without going the route of deploying and managing my own infrastructure?
If you essentially want ephemeral subscriptions, then there are a few things you can set on the Subscription object when you create a subscription:
Set the expiration_policy to a smaller duration. When a subscriber is not receiving messages for that time period, the subscription will be deleted. The tradeoff is that if your subscriber is down due to a transient issue that lasts longer than this period, then the subscription will be deleted. By default, the expiration is 31 days. You can set this as low as 1 day. For pull subscribers, the subscribers simply need to stop issuing requests to Cloud Pub/Sub for the timer on their expiration to start. For push subscriptions, the timer starts based on when no messages are successfully delivered to the endpoint. Therefore, if no messages are published or if the endpoint is returning an error for all pushed messages, the timer is in effect.
Reduce the value of message_retention_duration. This is the time period for which messages are kept in the event a subscriber is not receiving messages and acking them. By default, this is 7 days. You can set it as low as 10 minutes. The tradeoff is that if your subscriber disconnects or gets behind in processing messages by more than this duration, messages older than that will be deleted and the subscriber will not see them.
Subscribers that cleanly shut down could probably just call DeleteSubscription themselves so that the subscription goes away immediately, but for ones that shut down unexpectedly, setting these two properties will minimize the time for which the subscription continues to exist and the number of messages (that will never get delivered) that will be retained.
Keep in mind that Cloud Pub/Sub quotas limit one to 10,000 subscriptions per topic and per project. Therefore, if a lot of subscriptions are created and either active or not cleaned up (manually, or automatically after expiration_policy's ttl has passed), then new subscriptions may not be able to be created.
I think your original idea was better than ephemeral subscriptions tbh. I mean it works, but it feels totally unnatural. Depending on what your requirements are. For example, do clients only need to receive messages while they're connected or do they all need to get all messages?
Only While Connected
Your original idea was better imo. What I probably would have done is to create a gRPC stream service that clients could connect to. The implementation is essentially an observer pattern. The consumer will receive a message and then iterate through the subscribers to do a "Send" to all of them. From there, any time a client connects to the service, it just registers itself with that observer collection and unregisters when it disconnects. Horizontal scaling is passive since clients are sticky to whatever instance they've connected to.
Everyone always get the message, if eventually
The concept is similar to the above but the client doesn't implicitly un-register from the observer on disconnect. Instead, it would register and un-register explicitly (through a method/command designed to do so). Modify the 'on disconnected' logic to tell the observer list that the client has gone offline. Then the consumer's broadcast logic is slightly different. Now it iterates through the list and says "if online, then send, else queue", and send the message to a ephemeral queue (that belongs to the client). Then your 'on connect' logic will send all messages that are in queue to the client before informing the consumer that it's back online. Basically an inbox. Setting up ephemeral, self-deleting queues is really easy in most products like RabbitMQ. I think you'll have to do a bit of managing whether or not it's ok to delete a queue though. For example, never delete the queue unless the client explicitly unsubscribes or has been inactive for so long. Fail to do that, and the whole inbox idea falls apart.
The selected answer above is most similar to what I'm subscribing here in that the subscription is the queue. If I did this, then I'd probably implement it as an internal bus instead of an observer (since it would be unnecessary) - You create a consumer on demand for a connecting client that literally just forwards the message. The message consumer subscribes and unsubscribes based on whether or not the client is connected. As Kamal noted, you'll run into problems if your scale exceeds the maximum number of subscriptions allowed by pubsub. If you find yourself in that position, then you can unshackle that constraint by implementing the pattern above. It's basically the same pattern but you shift the responsibility over to your infra where the only constraint is your own resources.
gRPC makes this mechanism pretty easy. Alternatively, for web, if you're on a Microsoft stack, then SignalR makes this pretty easy too. Clients connect to the hub, and you can publish to all connected clients. The consumer pattern here remains mostly the same, but you don't have to implement the observer pattern by hand.
(note: arrows in diagram are in the direction of dependency, not data flow)

Simulating Google Appengine's Task Queue with Gearman

One of the characteristics I love most about Google's Task Queue is its simplicity. More specifically, I love that it takes a URL and some parameters and then posts to that URL when the task queue is ready to execute the task.
This structure means that the tasks are always executing the most current version of the code. Conversely, my gearman workers all run code within my django project -- so when I push a new version live, I have to kill off the old worker and run a new one so that it uses the current version of the code.
My goal is to have the task queue be independent from the code base so that I can push a new live version without restarting any workers. So, I got to thinking: why not make tasks executable by url just like the google app engine task queue?
The process would work like this:
User request comes in and triggers a few tasks that shouldn't be blocking.
Each task has a unique URL, so I enqueue a gearman task to POST to the specified URL.
The gearman server finds a worker, passes the url and post data to a worker
The worker simply posts to the url with the data, thus executing the task.
Assume the following:
Each request from a gearman worker is signed somehow so that we know it's coming from a gearman server and not a malicious request.
Tasks are limited to run in less than 10 seconds (There would be no long tasks that could timeout)
What are the potential pitfalls of such an approach? Here's one that worries me:
The server can potentially get hammered with many requests all at once that are triggered by a previous request. So one user request might entail 10 concurrent http requests. I suppose I could have a single worker with a sleep before every request to rate-limit.
Any thoughts?
As a user of both Django and Google AppEngine, I can certainly appreciate what you're getting at. At work I'm currently working on the exact same scenario using some pretty cool open source tools.
Take a look at Celery. It's a distributed task queue built with Python that exposes three concepts - a queue, a set of workers, and a result store. It's pluggable with different tools for each part.
The queue should be battle-hardened, and fast. Check out RabbitMQ for a great queue implementation in Erlang, using the AMQP protocol.
The workers ultimately can be Python functions. You can trigger workers using either queue messages, or perhaps more pertinent to what you're describing - using webhooks
Check out the Celery webhook documentation. Using all these tools you can build a production ready distributed task queue that implements your requirements above.
I should also mention that in regards to your first pitfall, celery implements rate-limiting of tasks using a Token Bucket algorithm.