I'm looking for a background processor in Clojure with features like Sidekiq in Ruby. Here is the list of the features I need:
logging
priority queue
persistent queues (so jobs could be persisted between deployments)
scheduled jobs
Web UI admin panel would be great
I know immutant scheduling delivers the bare minimum, but as far as I understand, it doesn't provide persistence and priority. Is there anything like this provided by Clojure or Java ecosystem?
I'm new to Clojure/JVM and maybe it can offer an alternative that I might not even need this kind of tools altogether.
anything under https://www.clojure-toolbox.com #Scheduling maybe ?
at-at
Chime
Cronj
Quartzite
Related
I'm making a web app using Django.
I'd like events to trigger 'background' tasks that run parallel to the Django application. (by parallel I just mean that they don't impact the speed of the users experience)
Types of tasks I'm talking about
a user logs in and an event is trigged to start populating that users cache in anticipation of future requests based on their use habits.
a user posts some data to the database but that post triggers an api call to another website where the returned data will be parsed, aggregated and used to supplement that users post.
rolling updates of data used in the app through api calls to other websites
aggregating data and running general maintenance tasks.
After a few days of research I'm thinking that I should use twisted to accomplish this. Which has lead me to my question:
Is twisted overkill for what I'm trying to accomplish?
Many of these tasks are far more i/o bound than cpu bound. So I'm thinking asynchronous is best.
Anyone advice would be appreciated.
Thank you
Yes, I think it's overkill.
Rather than fold in a full async framework such as Twisted, with all the technical overhead that brings, you might be better off using a task queue to be able to do what you want as a background process.
When your app needs to do a background task (anything that would otherwise block the request/response cycle), put the task in the queue and let a separate worker process pick things off the queue and deal with them as fast as it can. (You can always add more workers).
Two of the most popular queue libraries for Python/Django are celery and rq. They're especially good with Redis as a backend, but there are other backend options, too.
Personally, I much prefer rq over celery, in terms of its API and its clean setup, but both are used by a lot of people.
And both are definitely easier to get your head around than something like Twisted, IMO.
I have these tables
exam(id, start_date, deadline, duration)
exam_answer(id, exam_id, answer, time_started, status)
exam_answer.status possible values are 0-not yet started 1-started 2-submitted
Is there a way to update exam_answer.status once now - exam_answer.time_started is greater than exam.duration? Or if it is already past deadline?
I'll also mentioning this if it might help me better, I'm building this for a django project.
Django applications, like any other WSGI/web application, are only meant to handle request-response flows. If there aren't any requests, there is no activity and such changes will not happen.
You can either write a custom management command that's executed periodically by a cron job, but you run into the risk of possibly displaying incorrect data. You have elegant means at your disposal to compute the statuses before any related views start their processing, but this might be potentially a wasteful use of resources.
Your best bet might be to integrate a task scheduler with your application, such as Celery. Do not be discouraged because Celery seemingly runs in a concurrent multiprocess environment across several machines--the service can be configured to run in a single-thread and it provides a clean interface for scheduling such tasks that have to run at some exact point in the future.
Let's say Im making a crawler / scraper in clojure, and I want it to run periodically (at predefined times of day).
I want to define my jobs with quartz / quartzite (at least that seems to be the most robust solution.)
Now, to create a daemon process with clojure, I tried lein-daemon plugin, but it seems that it is a pretty risky endevour, since the plugin seems more than a bit buggy (or I am making some heavy mistakes)
What is the best way for me to create this service?
I want it to be able to restart itself upon system reboot, but I want to use clojure (quartzite) for my jobs (loading them from database, etc).
What is the robust - but clojury - way to create a long-running, daemon process?
EDIT:
The deployment environment will be something like a single VPS or a dedicated server.
There may be a dozen jobs loading their parameters from some data store, running anywhere from 1 - 8 times a day (or maybe more).
The correct process depends a lot on your environment. I work on deployment systems for complex web/mobile infrastructure with many long running Clojure processes. For this we use Pallet to create instances with the code checked out and configured, then we have a function that generates init scripts to start the services at boot. This process is appropriate to environments where you need a repeatable build on a cloud provider so it may be too heavy for your situation.
If you are looking for simple recurring jobs you may want to look into Immutant which is an application server for Clojure with good support for recurring jobs.
I've been working on a rails app for a couple of days now that I need an underlying "middle layer" for that connects my rails application to the various services that make up the data.
The basic setup looks like this:
Frontend ("Rails app") -> user requests data to be aggregated -> info goes in database and a JSON request is sent to "middle layer" to retrieve the data from a source of other places, process it, then send it back to the frontend which streams it to the users browser via websockets.
Middle layer -> uses sockets to listen for the frontend making a request. Once request is made, aggregation begins.
Base layer -> load balancing among a scalable network design.
I don't think that this is as efficient as it could be though. I feel like I'll run into concurrency problems or it just being too slow for use. I need to be able to scale it up.
My main question though rests in the area of what language would be more efficient for this to run fast?
It depends. What are your data sources? Are they on the same machine? Are they databases?
My gut tells me that the language you choose won't play much of a role in how well this kind
of application performs, but it it hard to say without knowing the details.
C++ is probably a bad idea for a scalable web-app though. It is very likely that you will end
up with something which is slower than what you would have written in Ruby, because you end up worrying about irrelevant details. With regards to concurrency concurrency problems, C++ is definitely not the easiest language in which to write concurrent code.
Without knowing more, I would recommend that you stick with Ruby or some other high-level language and profile to see where the bottlenecks are. If you find out that there is some tight loop which needs to run really fast you can write that part in C, but you probably won't need that.
Stick with Ruby unless you can prove an actual need for C++. Look into something like delayed_job to handle your background tasks.
There are many activities on an application that need things like:
Send email, Post to twitter
thumbnail an image, into several sizes
call a cron to find connected relationships
A good way to do these tasks is to write into an asynchronous queue on which operations are performed.
What django application can be used to implement such functionality, as the one Amazon Simple Queue service offers, locally?
I have come across celery. Right thing? Anything else that exists, like this?
Beanstalkd can also do what you want, and I've used it (though not from Python) to do some similar things (resizing images, and running other background tasks). There are a couple of Python client libraries to interface with it.