Workflow frameworks for Django [closed]

Workflow frameworks for Django [closed] - django

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
I've been looking for a framework to simplify the development of reasonably complex workflows in Django applications. I'd like to be able to use the framework to automate the state transitions, permissioning, and perhaps some extras like audit logging and notifications.
I've seen some older information on the same topic, but not too much in the last 2-3 years. The major choices I've heard of are GoFlow (not updated since 2/2009) and django-workflow (seems more active).
Has anyone used these packages? Are they mature and/or compatible with modern (1.3) Django? Are there other options out there worth considering that might be better or better supported?

Let me give a few notes here as i'm the author of django-fsm and django-viewflow, two projects that could be called "workflow libraries".
Workflow word itself is a bit overrated. Different kind of libraries and software could call themselves "workflow" but have varying functionality.
The commonality is that a workflow connects the steps of some process into a whole.
General classification
As I see, workflow implementation approaches can be classified as follows:
Single/Multiple users - Whether workflow library automates single user tasks or has permission checking/task assignment options.
Sequential/Parallel - Sequential workflow is just a state machine pattern implementation and allows to have single active state at a moment. Parallel workflows allow to have several active tasks at once, and probably have some sort of parallel sync/join functionality.
Explicit/Implicit - Whether workflow is represented as a separate external entity, or is weaved into some other class, that main responsibility is different.
Static/Dynamic - Static workflows are implemented in python code once and then executed, dynamic workflows typically could be configuring by changing contents of workflow database tables. Static workflows are usually better integrated with the rest of the django infrastructure
like views, forms and templates, and support better customization by usual python constructions like class inheritance. Dynamic workflows assume that you have generic interface that can adapt to any workflow runtime changes.
Of these, the first two could be considered gradual differences, but the other two are fundamental.
Specific packages
Here is brief description what we have nowadays in django, djangopackages and awesome-django project list under workflow section:
django.contrib.WizardView - implicit, single user, sequential, static the simplest workflow implementation we could have. It stores intermediate state in the hidden form post data.
django-flows - explicit, single user, sequential, static workflow, that keeps flow state in external storage, to allow user to close or open page on another tab and continue working.
django-fsm - implicit, multi-user, sequential, static workflow - the most compact and lightweight state machine library. State change events represented just as python methods calls of model class. Has rudimentary support for flow inheritance and overrides. Provides slots for associate permission with state transitions. Allows to use optimistic locking to prevent concurrent state updates.
django-states - explicit, multi-user, sequential, static workflow with separate class for state machine and state transitions. Transitions made by passing string name of transition to make_transition method. Provides way for associate permission with state transitions. Has a simple REST generic endpoint for changing model states using AJAX calls. State
machine inheritance support is not mentioned in the documentation, but class state definition makes it possible with none or few core library modifications.
django_xworkflows - explicit, sequential, static workflow with no support for user permission checking, separated class for state machine. Uses tuples for state and transition definitions, makes workflow inheritance support hard.
django-workflows - explicit, multi-user, sequential, dynamic workflow storing the state in library provided django models. Has a way to attach permission to workflow transition, and basically thats all.
None of these django state machine libraries have support for parallel workflows, which limits their scope of application a lot. But there are two that do:
django-viewflow - explicit, multi-user, parallel, static workflow, with support for parallel tasks execution, complex split and join semantic. Provides helpers to integrate with django functional and class based views, and different background task execution queries, and various pessimistic and optimistic lock strategies to prevent concurrent updates.
GoFlow, mentioned in question, tends to be the explicit, multi-user, parallel, dynamic workflow, but it has been forsaken by author for a years.
I see the way to implement dynamic workflow construction functionality on top of django-viewflow. As soon as it is completed, if will close the last and the most sophisticated case for workflow implementation in the django world.
Hope, if anyone was able to read hitherto, now understands the workflow term better, and can do the conscious choice for workflow library for their project.

Are there other options out there worth considering that might be better or better supported?
Yes.
Python.
You don't need a workflow product to automate the state transitions, permissioning, and perhaps some extras like audit logging and notifications.
There's a reason why there aren't many projects doing this.
The State design pattern is pretty easy to implement.
The Authorization rules ("permissioning") are already a first-class
part of Django.
Logging is already a first-class part of Python (and has been
added to Django). Using this for audit logging is either an audit
table or another logger (or both).
The message framework ("notifications") is already part of Django.
What more do you need? You already have it all.
Using class definitions for the State design pattern, and decorators for authorization and logging works out so well that you don't need anything above and beyond what you already have.
Read this related question: Implementing a "rules engine" in Python

It's funny because I would have agreed with S.Lott about just using Python as is for a rule engine. I have a COMPLETELY different perspective now having done it.
If you want a full rule engine, it needs a quite a few moving parts. We built a full Python/Django rules engine and you would be surprised what needs to be built in to get a great rule engine up and running. I will explain further, but first the website is http://nebrios.com.
A rule engine should atleast have:
Acess Control Lists - Do you want everyone seeing everything?
Key/Value pair API - KVP's store the state, and all the rules react to changed states.
Debug mode - Being able to see every changed state, what changed it and why. Paramount.
Interaction through web forms and email - Being able to quickly script a web form is a huge plus, along with parsing incoming emails consistently.
Process ID's - These track a "thread" of business value. Otherwise processes would be continually overlapping.
Sooo much more!
So try out Nebri, or the others I list below to see if they meet your needs.
Here's the debug mode
An auto generated form
A sample workflow rule:
class task_sender(NebriOS):
# send a task to the person it got assigned to
listens_to = ['created_date']
def check(self):
return (self.created_date is not None) and (self.creator_status != "complete") and (self.assigned is not None)
def action(self):
send_email (self.assigned,"""
The ""{{task_title}}"" task was just sent your way!
Once you finish, send this email back to log the following in the system:
i_am_finished := true
It will get assigned back to the task creator to look over.
Thank you!! - The Nebbs
""", subject="""{{task_title}}""")
So, no, it's not simple to build a rules based, event based workflow engine in Python alone. We have been at it over a year! I would recommend using tools like
http://nebrios.com
http://pyke.sourceforge.net (It's Python also!)
http://decisions.com
http://clipsrules.sourceforge.net

A package written by an associate of mine, django-fsm, seems to work--it's both fairly lightweight and sufficiently featureful to be useful.

I can add one more library which supports on the fly changes on workflow components unlike its equivalents.
Look at django-river
It is now with a pretty admin called River Admin

ActivFlow: a generic, light-weight and extensible workflow engine for agile development and automation of complex Business Process operations.
You can have an entire workflow modeled in no time!
Step 1: Workflow App Registration
WORKFLOW_APPS = ['leave_request']
Step 2: Activity Configuration
from activflow.core.models import AbstractActivity, AbstractInitialActivity
from activflow.leave_request.validators import validate_initial_cap
class RequestInitiation(AbstractInitialActivity):
"""Leave request details"""
employee_name = CharField(
"Employee", max_length=200, validators=[validate_initial_cap])
from = DateField("From Date")
to = DateField("To Date")
reason = TextField("Purpose of Leave", blank=True)
def clean(self):
"""Custom validation logic should go here"""
pass
class ManagementApproval(AbstractActivity):
"""Management approval"""
approval_status = CharField(verbose_name="Status", max_length=3, choices=(
('APP', 'Approved'), ('REJ', 'Rejected')))
remarks = TextField("Remarks")
def clean(self):
"""Custom validation logic should go here"""
pass
Step 3: Flow Definition
FLOW = {
'initiate_request': {
'name': 'Leave Request Initiation',
'model': RequestInitiation,
'role': 'Submitter',
'transitions': {
'management_approval': validate_request,
}
},
'management_approval': {
'name': 'Management Approval',
'model': ManagementApproval,
'role': 'Approver',
'transitions': None
}
}
Step 4: Business Rules
def validate_request(self):
return self.reason == 'Emergency'

I migrate the django-goflow from django 1.X -python 2.X to fit for django 2.X - python 3.x, the project is at django2-goflow

Related

Should I use an internal API in a django project to communicate between apps?

I'm building/managing a django project, with multiple apps inside of it. One stores survey data, and another stores classifiers, that are used to add features to the survey data. For example, Is this survey answer sad? 0/1. This feature will get stored along with the survey data.
We're trying to decide how and where in the app to actually perform this featurization, and I'm being recommended a number of approaches that don't make ANY sense to me, but I'm also not very familiar with django, or more-than-hobby-scale web development, so I wanted to get another opinion.
The data app obviously needs access to the classifiers app, to be able to run the classifiers on the data, and then reinsert the featurized data, but how to get access to the classifiers has become contentious. The obvious approach, to me, is to just import them directly, a la
# from inside the Survey App
from ClassifierModels import Classifier
cls = Classifier.where(name='Sad').first() # or whatever, I'm used to flask
data = Survey.where(question='How do you feel?').first()
labels = cls(data.responses)
# etc.
However, one of my engineers is saying that this is bad practice, because apps should not import one another's models. And that instead, these two should only communicate via internal APIs, i.e. posting all the data to
http://our_website.com/classifiers/sad
and getting it back that way.
So, what feels to me like the most pressing question: Why in god's name would anybody do it this way? It seems to me like strictly more code (building and handling requests), strictly less intuitive code, that's more to build, harder to work with, and bafflingly indirect, like mailing a letter to your own house rather than talking to the person who lives there, with you.
But perhaps in easier to answer chunks,
1) Is there REALLY anything the matter with the first, direct, import-other-apps-models approach? (The only answers I've found say 'No!,' but again, this is being pushed by my dev, who does have more industrial experience, so I want to be certain.)
2) What is the actual benefit of doing it via internal API's? (I've asked of course, but only get what feel like theoretical answers, that don't address the concrete concerns, of more and more complicated code for no obvious benefit.)
3) How much do the size of our app, and team, factor into which decision is best? We have about 1.75 developers, and only, even if we're VERY ambitious, FOUR users. (This app is being used internally, to support a consulting business.) So to me, any questions of Best Practices etc. have to factor in that we have tiny teams on both sides, and need something stable, functional, and lean, not something that handles big loads, or is externally secure, or fast, or easily worked on by big teams, etc.
4) What IS the best approach, if NEITHER of these is right?

It's simply not true that apps should not import other apps' models. For a trivial refutation, think about the apps in django.contrib which contain models such as User and ContentType, which are meant to be imported and used by other apps.
That's not to say there aren't good use cases for an internal API. I'm in the planning process of building one myself. But they're really only appropriate if you intend to split the apps up some day into separate services. An internal API on its own doesn't make much sense if you're not in a service-based architecture.

I cant see any reason why you should not import an app model from another one. Django itself uses several applications and theirs models internally (like auth and admin). Reading the applications section of documentation we can see that the framework has all the tools to manage multiple applications and their models inside a project.
However it seems quite obvious to me that it would make your code really messy and low-performance to send requests to your applications API.
Without context it's hard to understand why your engineer considers this a bad practice. He was maybe referring to database isolation (thus, see "Working multiple databases" in documentation) or proper code isolation for testing.

It is right to think about decoupling your apps. But I do not think that internal REST API is a good way.
Neither direct import of models, calling queries and updates in another app is a good approach. Every time you use model from another app, you should be careful. I suggest you to try to separate communication between apps to the simple service layer. Than you Survey app do not have to know models structure of Classifier app::
# from inside the Survey App
from ClassifierModels.services import get_classifier_cls
cls = get_classifier_cls('Sad')
data = Survey.where(question='How do you feel?').first()
labels = cls(data.responses)
# etc.
For more information, you should read this thread Separation of business logic and data access in django
In more general, you should create smaller testable components. Nowadays I am interested in "functional core and imperative shell" paradigm. Try Gary Bernhardt lectures https://gist.github.com/kbilsted/abdc017858cad68c3e7926b03646554e

Separation of Runtime and History

I whould like to use separate databases for runtime and history data without implementing a custom HistoryEventHandler. Does someone know how this is possible?
I read the camunda user guides but this did not help much because it only hints the custom implementation way.
Currently, everytime I query history data (about 2mil activity entries) the performance of the system drops as it kind of blocks the runtime, too. I'd like to avoid this without loosing the ability to query historic data.

That would be a really cool feature, but it is currently not supported. You will have to disable the default history and implement a custom handler.

Camunda BPM offers Optimize, which pulls the history data from the Engine to an Elastic Search database. If you are using the Enterprise version, it may be a way to solve it.

(Based on your comments to other answers, it appears that you're interested in learning more about custom HistoryEventHandler implementations. Thus, I'm adding this answer in the hope that it will help.)
Implementing a custom History Event Handler isn't difficult, but there are a few important points to keep in mind:
Unless you want to skip the storage of history information in the standard Camunda history tables, you'll want to use their CompositeHistoryEventHandler. This simply gives you the ability to use multiple HistoryEventHandler implementations.
Any HistoryEventHandler implementations will complete in the same threads as the ones executing process instances; thus, you will want to be cognizant of the performance impacts your custom HistoryEventHandler will have.
You may want to consider publishing your history events through a message bus or messaging system to allow for reliable delivery without impacting Camunda workflow instance performance.
Finally, it may make sense to use your custom HistoryEventHandler along with Camunda's default HistoryEventHandler and their functionality for deleting process instances after a period of time. This would allow you to use their querying capabilities for some period of time without having the history stack up (and thus slowing down your system).

How to handle per object permission in Django nowadays?

I was about to use django-guardian until I came across the following in the official documentation:
https://docs.djangoproject.com/en/stable/topics/auth/customizing/#handling-authorization-in-custom-backends
Permissions can be set not only per type of object, but also per specific object instance. By using the has_add_permission(), has_change_permission() and has_delete_permission() methods provided by the ModelAdmin class, it is possible to customize permissions for different object instances of the same type.
Does that mean django-guardian is no longer needed with newer versions of Django?
Please clarify.

Indeed, while reading the docs, I got excited that Django would cater for "per object permissions checking" out of the box, especially in the admin, and that it would be a matter of time to understand how I could activate it.
However, this does not seem to always be the case.
Django undoubtedly strives to provide the grounds (API) for such an implementation, but this implementation sometimes needs good coding skills and Django understanding.
It is the developer who will get these tools together by creating the app that suits its needs. This could be either easy or ... not so easy!
This contradicting information forms the base for my web crawling which focuses on finding a solution to the "per-object permissions" issue, somehow ... effectively for my project's needs or scale and of course my own coding skills and Django understanding up to now.
Django-guardian seems to be the most robust, full-fledged, full-blown application for this purpose, and it also has a 3 years old open issue regarding its admin integration.
There are also other more lightweight django applications that address specific needs which are production-stable, as well.
While trying to make ends meet in this somehow tricky quest, I am leaning towards using django-rules for its simple and focused on my needs functioning.

Tastypie-nonrel, django, mongodb: too many nestings

I am developing a web application with django, backbone.js, tastypie and mongodb. In order to adapt tastypie and django to mongodb I am using django-mongodb-engine and tastypie-nonrel. This application has a model Project, which has a list of Tasks. So it looks like this:
class Project(models.Model):
user = models.ForeignKey(User)
tasks = ListField(EmbeddedModelField('Task'), null=True, blank=True)
class Task(models.Model):
title = models.CharField(max_length=200)
Thanks to tastypie-nonrel, getting the list of task of a project is done in a simple way with a GET request at /api/v1/project/:id:/tasks/
Now I want to extend this Task model with a list of comments:
class Task(models.Model):
title = models.CharField(max_length=200)
comments = ListField(EmbeddedModelField('Comment'), null=True, blank=True)
class Comment(models.Model):
text = models.CharField(max_length=1000)
owner = models.ForeignKey(User)
The problem with this implementation is that tastypie-nonrel does not support another nesting, so is not possible to simple POST a comment to /api/v1/project/:id:/task/:id:/comments/
The alternative is to just make a PUT request of a Task to /api/v1/project/:id:/task/, but this would create problems if two users decide to add a comment to the same Task at the same time, as the last PUT would override the previous one.
The last option (aside from changing tastypie-nonrel) is to not embed Comment inside the Task and just hold the ForeignKey, so the request would go to /api/v1/Comment/. My question is if this breaks the benefits of using MongoDB (as it is needed cross queries)? Is there any better way of doing it?
I have little experience in any of the technologies of the stack, so it may be I am not focusing well the problem. Any suggestions are welcome.

It seems like you are nesting too much. That said, you can create custom methods/URL mappings for tastypie and then run your own logic instead of relying on "auto-magic" tastypie. If you are worried about the comment overriding issue, you need transactions anyway. Your code then should be robust enough to handle the behavior of a failed transaction, for example to retry. This would greatly throttle your writes for sure if you are constantly locking on a large object with many writers, however, which points to a design issue as well.
One way you can mitigate this a bit is to write to intermediate source such as a task queue or redis, then dump in the comments as needed. It just depends on how reliable/durable your solution. A task queue would handle retries for failed transactions at least; with redis you can do something with pub/sub.
You should consider a few things about your design IMO regarding MongoDB.
Avoid creating overly large monolithic objects. Although this is a benefit of Mongo, it depends on your usage. If you are for instance always returning your project as a top-level object, then as the tasks and comments grow, the network traffic alone will kill performance.
Imagine a very contrived example in which the project specific
data is 10k, each task alone is 5k, and each comment alone is 2k, if you have a project with 5 tasks, 10 comments per task, you are talking about 10k + 5*5k + 10*2k. For a very active project with lots of comments, this will be heavy sending across the network. You can do slice/projection queries to reconcile this issue, but with some limitations and implications.
A corollary to the above, structure your objects to your use cases. If you don't need to bring things back together, they can be in different collections. Just because you "think" you need to bring them back together, doesn't mean implementation wise they need to be retrieved in the same fetch (although this is normally ideal).
Even if you do need everything in one use case/screen, another solution that may be possible in some designs is to load things in parallel, or even deferred via JavaScript after the page loads, using AJAX. For example, you could load the task info at the top, and then make an async call to load the comments separately, similar to how Disqus or Livefyre work as integrations in other sites. This could help resolve your nesting issue somewhat as you'd get rid of the task/project levels and simply store some IDs on each comment/record to be able to query between collections.
Keep in mind you may not want to retrieve all comments at once, and if you have a lot of comments, you may hit up against limitations of the size of one document. The size is bigger these days in recent versions of Mongo, however it usually doesn't make sense anyway to have a single record with a large amount of data, going back to the first item above.
My recommendations are:
Use transactions if you're concerned about losing comments.
Add a task queue/redis/something durable if you're worried about competing writes and losing things as a result of #1. If not, ignore it. Is it the end of the world if you lose a comment?
Consider restructuring particularly comments into a separate collection to ease your tastypie issues. Load things deferred or in parallel if needed.

Customizable Web Applications

At my company we develop prefabricated web applications. While our applications work as-is in many cases, often we receive complex customization requests. We are having a problem in trying to perform this in a structured way. Generic functionality should not be influenced by customizations. At the moment we are looking into Spring Web Flow and it looks like it can handle a part of what we need.
For example, we have an Online Shopping and we have a request from a client that in a moment of checking out the Shopping Basket order has to be written to a proprietary logging system.
With SWF, it is possible to inherit our Generic Checkout Flow with ClientX Checkout Flow and to extend it with states necessary to perform a custom log write. This scenario seems to be handled well. This means we can keep our Generic Checkout Flow as is and extend it with custom functionality, according to Open/Closed principle. Our team in time can add functionality to the Generic Checkout Flow and this can be distributed to a client without modifying the extension.
However, sometimes clients request our pages to be customized. For example, in our Online Shopping app a client requests a multiple currencies feature. In this case, you need to modify the View as well as the Flow (Controller). Is there a technology that would let me extend the Generic View and not modify it? So far, only two solutions with majority of template-based views (JSP, Struts, Velocity etc.) seems to be
to have a specific version of view for each client. This obviously leads to implementation explosion
to make application configurable depending on parameter (if multipleCurrency then) that leads to code explosion - a number of configuration conditions that have to be checked in each page
What would be the best solution in this case? There are probably some other customization cases I am not able to recall. Is there maybe a component based view technology that would let me extend certain base view and does that makes sense.
What are typical solutions to a problem of configurable web applications?

each customization point implies some level of conditionality.
Where possible folks tend to use style sheets to control some aspects. For example display of a currency selector perhaps could be done like that.
Another thought for that currency example: 1 is the limiting case of many. So the model provides the list of currencies. The view displays a selector if there are many, and a fixed field if only one. Quite well-defined behaviour - easy to test reusable for other scenarios.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js