Intercept Azure WebJobs queue/service bus triggers? - azure-webjobs

I'd like to be able to intercept a call to my ServiceBus triggered job, look at the message and set some thread-static context data before my job is actually triggered. Is there any way to do this with the current SDK?

No there isn't a way to do this currently. However, we're looking at adding a new feature that will allow you to do so. We've already done this for Azure Queues (see PR https://github.com/Azure/azure-webjobs-sdk/pull/526) and we'll do something very similar for ServiceBus. Would this meet your needs?
You can see the issue in our public issues list here: https://github.com/Azure/azure-webjobs-sdk/issues/275.

Related

How do you ensure it does work with google cloud pub/sub?

I am currently working on a distributed crawling service. When making this, I have a few issues that need to be addressed.
First, let's explain how the crawler works and the problems that need to be solved.
The crawler needs to save all posts on each and every bulletin board on a particular site.
To do this, it automatically discovers crawling targets and publishes several messages to pub/sub. The message is:
{
"boardName": "test",
"targetDate": "2020-01-05"
}
When the corresponding message is issued, the cloud run function is triggered, and the data corresponding to the given json is crawled.
However, if the same duplicate message is published, duplicate data occurs because the same data is crawled. How can I ignore the rest when the same message comes in?
Also, are there pub/sub or other good features I can refer to for a stable implementation of a distributed crawler?
because PubSub is, by default, designed to deliver AT LEAST one time the messages, it's better to have idempotent processing. (Exact one delivery is coming)
Anyway, your issue is very similar: twice the same message or 2 different messages with the same content will cause the same issue. There is no magic feature in PubSub for that. You need an external tool, like a database, to store the already received information.
Firestore/datastore is a good and serverless place for that. If you need low latency, Memory store and it's in memory database is the fastest.

(GCloud SQL) Is there a way to opting in to maintenance notifications via CLI or API?

I'm looking for a way to toggle this notification via gcloud CLI or API call since I need to automate it.
Is there a way of doing it? If not, is this going to be available in the future?
Having a lot of environment is hard to keep track of all of them via UI.
I have checked the Cloud SQL Admin API and it seems it is not possible yet. The best way to proceed in this cases is to create a Feature Request in the Public Issue Tracker. I searched for an existing one, but I didn't find any. When submitting a Feature Request, the Engineering team has more visibility of your needs and they can prioritize those requests by the number of users affected.

Detect Presence in AWS AppSync

I am working on an app that relies heavily on detecting when users go offline and go back online. I wanted to do this with AWS AppSync, but I can't seem to find a way to do this in the documentation. Is there a way to do it in AppSync?
Thanks for the question. Detecting presence is not currently support out of the box but you can likely build similar features yourself depending on the use case.
For example, a resolver on a subscription field is invoked every time a new device tries to open a subscription. You can use this resolver field to update some data source to tell the rest of your system that some user is currently subscribed. If using something like DynamoDB, you can use a TTL field to have records automatically removed after a certain amount of time and then require a user to "ping" every N minutes to specify that they are still online.
You could also have your application call a mutation when it first starts to register the user as online, then have the application call another mutation when the app closes to register it as offline. You could combine this with TTLs to prevent stale records in situations where the app crashes or something prevents the call to register as offline.
Thanks for the suggestion and hope this helps in the meantime.

Azure Scheduler Implementation

I have written a web job which will do multiple tasks that run on different schedules like once a day, once in every hour and so and I achieved this by using Timer delegate. Now I am thinking of changing that approach and create a Scheduler job for each scenario. I was able to find some information regarding schedules from googling but was never able to join them to form a flow.
I learned that we can create job collection and each collection can have 'n' jobs based on the pricing tier we are using. After creating a job the program logic that the job must do how can we bind them to the corresponding job?
Also linking jobs to job collection how can I achieve that?
Thanks
A typical workflow is that you would write to a Azure Message Queue with a message, then you would have an Azure Cloud Service that reads from that and does the processing.
To tie specific jobs to specific program logic you can either embed information about the type into the message and have something that generically picks the messages up and turns them into specific operations/classes or you could have behavior specific queues and each job would write to its own queue and you would read from each queue by a different Cloud Service.
I think this will solve my problem either using API calls or queue processing
Solution
If I understand your question, you have a WebJob that has multiple methods, each of which needs to be called on a different schedule. Instead of going through the hassle of setting up a Scheduler and having yet another resource that you have to manage, mark each method you need called with a TimerTriggerAttribute.

Replay events with Google Pub/Sub

I'm looking into Google Cloud, it is very appealing, specially for data intensive applications. I'm looking into Pub/Sub + Dataflow and I'm trying to figure out the best way to replay events that were send via Pub/Sub in case the processing logic changes.
As far as I can tell, Pub/Sub retention has an upper bound of 7 days and it is per subscription, the topic itself does not retain data. In my mind, it would allow to disable the log compaction, like in Kafka, so I can replay data from the very beginning.
Now, since dataflow promises that you can run the same jobs in batch and streaming mode, how effective would it be to simulate this desired behavior by dumping all events into Google Storage and replying from there?
I'm also open for any other ideas.
Thank you
As you said, Cloud Pub/Sub does not currently support replays, so you need to save events somewhere to replay later and Cloud Storage sounds like a good place to do that.
Cloud Pub/Sub now has the ability to replay previously acknowledged messages. Please see the quickstart and related blog post for information on how to use the feature.