MassTransit and Quartz.net - How to populate lots of scheduled jobs

MassTransit and Quartz.net - How to populate lots of scheduled jobs - scheduling

I have been looking at using MassTransit's Quartz.Net (using AdoJobStore) implementation to schedule messages send/publish for future and all of this works fairly smoothly.
The bit where I am stuck is, as a part of the production deployment, I need to set up a lot of "Scheduled Messages" to be issued at various times during the next year odd.
Is there a mechanism available to pre-populate the Quartz SQL store with Triggers/Jobs externally ?

I finally figured a way to do this; posting here, so it might help others if needed.
The Quartz SQL DB is nothing but a simple data serialised into objects.
e.g. varbinary for JOB_DATA and ticks in for time. Other values are fairly simple.
I ended up created a sample app to setup some schedules and then reversed the database later to figure.
It was all quite simple in the end, and now I have splain SQL insert script which inserts the schedules as a part of the CD pipeline.

Related

Is there a way to compute this amount of data and still serve a responsive website?

Currently I am developing a django + react website, that will (I hope) serve a decent number of users. The project demo is mostly complete, and I am starting to think about the scale required to put this thing into production
The website essentially does three things:
Grab data from external APIs (i.e. Twitter) for 50,000 unique keywords (the keywords dont change). This process happens every 30 minutes
Run computation on all of the data, and save that computation to the database. Assume that the algorithm is as optimized as possible
When a user visits the website it should serve a pretty graph/chart of all of the computational data per keyword
The issue being, this is far too intense a task to be done by the same application that serves the website, users would be waiting decades to see their data. My current plan is to have a separate API made that services the website with the data, that the website can then store in it's database. This separate API would process the data without fear of affecting users, and it should be able to finish its current computation in under 30 minutes, in time for the next round of data.
Can anyone help me understand how I can better equip my project to handle the scale? I'd love some ideas.
As a 4th year CS Student I figured it's time to put a real project out into the world and I am very excited about it and the progress I've made so far. My main worry is that the end users will be negatively effected, if I don't figure out some kind of pipeline to make this process happen.
To re-iterate my idea:
Django + React - This is the forward facing website
External API - Grabs the data off the internet and processes it, and waits for a GET request from the website
Is there a better way to do this? Or on the other hand am I severely overestimating how computationally heavy this is.
Edit: Including current research
Handling computationally intensive tasks in a Django webapp
Separation of business logic and data access in django

What you want is to have the computation task to be executed by a different process in the "background".
The most straight-forward and popular solution is to use Celery, see here.
The Celery worker(s) - which performs the background task - can either run on the same machine as the web-application or (when scale becomes an issue), you can change the configuration so that it will run on an entirely different machine.

Schedule a task inside the Hyperledger Composer ledger

I need to update some field every day.
I came up with 2 possible solutions:
Update it via REST API - for example, a program that runs on some server, updates the field via REST API and then sleeps 1 day. Problem: if the program stops it does not update the ledger, thus the network does not work correctly anymore.
Make a smart contract that sleeps 1 day and then updates the fields. Problem: as far as I know how the internals are working, isn't that going to make problems with reaching the consensus?

correct on 2 - you likely won't get a deterministic result (whatever you're updating it with but sounds like its date-based and not sure whether its a calendar or elapsed day etc etc) and the update will fail. You're best to manage the update based on a field value from the client side. You need some high availability solution and/or checks for your client-side issue that the update takes place. The ledger is not really the place to rely on applying your operational, schedule-based update.

Running continually tasks alongside the django app

I'm building a django app which lists the hot(according to a specific algorithm) twitter trending topics.
I'd like to run some processes indefinitely to make twitter API calls and update the database(postgre) with the new information. This way the hot trending topic list gets updated asynchronously.
At first it seemed to me that celery+rabbitmq were the solution to my problem, but from what I understand they are used within django to launch scheduled or user triggered tasks, not indefinitely running tasks.
The solution that comes to my mind is write a .py file to continually put trending topics in a queue and write independent .py files continually running, making get queue requests and saving the data in the db used by django with raw SQL or SQLAlchemy. I think that this could work, but I'm pretty sure there is a much better way to do it.

If you just need to keep some processes running continually, supervisor is a nice solution.
You can combine it with any queuing technology you like to push things into your queues.

Django: updating the DB asynchronously

I develop a django app where lots of DB updates could/should be deferred to later time.
What would be a good way to update the DB in a background batch job?
One way I could think of is to have a message queue that would contain raw SQL statements.
The django app would fill the queue with raw SQLs when the update should be done asynchronously.
A simple background job, (in a different un-related process), would just deqeue and execute the SQL statements it in its own pace..
What do you think?

Celery is often used for this.
Start with this related questions: https://stackoverflow.com/questions/tagged/celery.

I've found this good review about the subject. It recommends Gearman
Seems a lighter solution than Celery.. I think I will give it a try

How to best launch an asynchronous job request in Django view?

One of my view functions is a very long processing job and clearly needs to be handled differently.
Instead of making the user wait for long time, it would be best if I were able to lunch the processing job which would email the results, and without waiting for completion notify the user that their request is being processed and let them browse on.
I know I can use os.fork, but I was wondering if there is a 'right way' in terms of Django. Perhaps I can return the HTTP response, and than go on with this job somehow?

There are a couple of solutions to this problem, and the best one depends a bit on how heavy your workload will be.
If you have a light workload you can use the approach used by django-mailer which is to define a "jobs" model, save new jobs into the database, then have cron run a stand-alone script every so often to process the jobs stored in the database (deleting them when done). You can use something like django-chronograph to manage the job scheduling easier
If you need help understanding how to write a script to process the job see James Bennett's article Standalone Django Scripts for help.
If you have a very high workload, meaning you'll need more than a single server to process the jobs, then you want to use a real distribute task queue. There is a lot of competition here so I can't really detail all the options, but a good one to use with for Django apps is celery.

Why not simply start a thread to do the processing and then go on to send the response?

Before you select a solution, you need to determine how the process will be run. I.e is it the same process for every single user, the data is the same and can be scheduled regularly? or does each user request something and the results are slightly different ?
As an example, if the data will be the same for every single user and can be run on a schedule you could use cron.
See: http://www.b-list.org/weblog/2007/sep/22/standalone-django-scripts/
or
http://docs.djangoproject.com/en/dev/howto/custom-management-commands/
However if the requests will be adhoc and you need something scalable that can handle high load and is asynchronous: what you are actually looking for is a message queing system. Your view will add a request to the queue which will then get acted upon.
There are a few options to implement this in Django:
Django Queue service is purely django & python and simple, though the last commit was in April and it seems the project has been abandoned.
http://code.google.com/p/django-queue-service/
The second option if you need something that scales, is distributed and makes use of open source message queing servers: celery is what you need
http://ask.github.com/celery/introduction.html
http://github.com/ask/celery/tree

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js