Django-fsm, what am I missing? - django

I'm starting to wonder what is the point of django-fsm?
I am working on a production management system. As an example, a transition from state INCEPTED (details being entered) to states IN_PRODUCTION (being manufactured) or RESOURCE_WAIT (some necessary input entity is not yet available). Establishing the details involves querying a considerable number of different models, and might come to involve asking questions of the user.
It seems unnatural to attempt to put querysets on other models into the model containing the state field. (It's causing me a circular import problem as well, which I don't know how to resolve).
So, I have written this transaction as a view instead, which also means that I can display a list of checks which were made, and their success/fail status. The issue of making sure that the transition is fully committed or not committed is easily handled via with transaction.atomic() so if anything goes wrong, nothing is committed to the DB.
Which leaves me wondering what I am missing. Why does django-fsm exist? It doesn't seem to fit into what I am trying to accomplish. Too low-level, or ....

Related

Django best practices to validate data in other tables -taking complexity from view file?

I was wondering about best practices in Django of validating the tables content
I am creating a Sales Orders and my SO should check availability of the items I have in stock and if they are not in stock it will trigger manufacturing orders and purchase orders.
I don't want to make very complex view and looking for a way to decouple logic from there and also I predict performance issues.
What are best practices or ready solutions I can use in Django framework to address view complexity ?
I see different possibilities but I am wondering what will be the best fit in my case :
managers
celery - just to run a job occasionally I want the app to be
real time so I don't like this option.
using signals /pre_save/post_sav
model validation
creating extra layer like services.py file
Since I am new to Django I am a bit puzzled what root to take.
Not sure if this is the answer you are looking for.
Signals are for doing things automatically when events happen. Most commonly used to do things before and after model operations. So if you need to do something every time you save a record or every time you create a new record or delete that is where you use signals.
Managers are used to manage record retrieval and manipulations. If you want to do some clever way of retrieving data you can define a custom manager and add some custom methods to it. If you want to override some default behaviors of querysets you would also do it with a custom manager.
Celery is for running things asynchronously. If you are worried that some processing you are doing might take a long time that is were you might consider offloading things to celery. A friendly warning though, doing things asynchronously raises complexity of your code quite a bit, since you need to add some mechanism to pass the data back from celery tasks into your django app and your users.
services.py link that you posted seems to do what you want, it just provides a place where you can put logic that is not specific to a particular view.
Here on stackoverflow, i got an advice from some experienced developers that premature optimization is the root of all evil.
What i suggest is keep it simple. Making the view a little more complex is actually better than effectively adding one more layer of complexity. I would suggest that you try to put most of you logic in models and whatever remains after that in views.
Also, unnecessarily using multiple packages would not solve much of your problem so use the when its necessary. Otherwise try to write the minimal logic yourself so that you donot have to use many apps.
Signals and other things as everybody say is not a great thing however promising it may seem. Just try to make things simpler.
One more point from my side as you are just starting out, go through class based views and try to use them when you get familiar. That will simplify your views the most. Plus, if ou are new to django, read a little code. https://github.com/vitorfs/bootcamp might help you in initiation.

SQL Query minimizing/caching in a C++ application

I'm writing a project in C++/Qt and it is able to connect to any type of SQL database supported by the QtSQL (http://doc.qt.nokia.com/latest/qtsql.html). This includes local servers and external ones.
However, when the database in question is external, the speed of the queries starts to become a problem (slow UI, ...). The reason: Every object that is stored in the database is lazy-loaded and as such will issue a query every time an attribute is needed. On average about 20 of these objects are to be displayed on screen, each of them showing about 5 attributes. This means that for every screen that I show about 100 queries get executed. The queries execute quite fast on the database server itself, but the overhead of the actual query running over the network is considerable (measured in seconds for an entire screen).
I've been thinking about a few ways to solve the issue, the most important approaches seem to be (according to me):
Make fewer queries
Make queries faster
Tackling (1)
I could find some sort of way to delay the actual fetching of the attribute (start a transaction), and then when the programmer writes endTransaction() the database tries to fetch everything in one go (with SQL UNION or a loop...). This would probably require quite a bit of modification to the way the lazy objects work but if people comment that it is a decent solution I think it could be worked out elegantly. If this solution speeds up everything enough then an elaborate caching scheme might not even be necessary, saving a lot of headaches
I could try pre-loading attribute data by fetching it all in one query for all the objects that are requested, effectively making them non-lazy. Of course in that case I will have to worry about stale data. How would I detect stale data without at least sending one query to the external db? (Note: sending a query to check for stale data for every attribute check would provide a best-case 0x performance increase and a worst-caste 2x performance decrease when the data is actually found to be stale)
Tackling (2)
Queries could for example be made faster by keeping a local synchronized copy of the database running. However I don't really have a lot of possibilities on the client machines to run for example exactly the same database type as the one on the server. So the local copy would for example be an SQLite database. This would also mean that I couldn't use an db-vendor specific solution. What are my options here? What has worked well for people in these kinds of situations?
Worries
My primary worries are:
Stale data: there are plenty of queries imaginable that change the db in such a way that it prohibits an action that would seem possible to a user with stale data.
Maintainability: How loosely can I couple in this new layer? It would obviously be preferable if it didn't have to know everything about my internal lazy object system and about every object and possible query
Final question
What would be a good way to minimize the cost of making a query? Good meaning some sort of combination of: maintainable, easy to implement, not too aplication specific. If it comes down to pick any 2, then so be it. I'd like to hear people talk about their experiences and what they did to solve it.
As you can see, I've thought of some problems and ways of handling it, but I'm at a loss for what would constitute a sensible approach. Since it will probable involve quite a lot of work and intensive changes to many layers in the program (hopefully as few as possible), I thought about asking all the experts here before making a final decision on the matter. It is also possible I'm just overlooking a very simple solution, in which case a pointer to it would be much appreciated!
Assuming all relevant server-side tuning has been done (for example: MySQL cache, best possible indexes, ...)
*Note: I've checked questions of users with similar problems that didn't entirely satisfy my question: Suggestion on a replication scheme for my use-case? and Best practice for a local database cache? for example)
If any additional information is necessary to provide an answer, please let me know and I will duly update my question. Apologies for any spelling/grammar errors, english is not my native language.
Note about "lazy"
A small example of what my code looks like (simplified of course):
QList<MyObject> myObjects = database->getObjects(20, 40); // fetch and construct object 20 to 40 from the db
// ...some time later
// screen filling time!
foreach (const MyObject& o, myObjects) {
o->getInt("status", 0); // == db request
o->getString("comment", "no comment!"); // == db request
// about 3 more of these
}
At first glance it looks like you have two conflicting goals: Query speed, but always using up-to-date data. Thus you should probably fall back to your needs to help decide here.
1) Your database is nearly static compared to use of the application. In this case use your option 1b and preload all the data. If there's a slim chance that the data may change underneath, just give the user an option to refresh the cache (fully or for a particular subset of data). This way the slow access is in the hands of the user.
2) The database is changing fairly frequently. In this case "perhaps" an SQL database isn't right for your needs. You may need a higher performance dynamic database that pushes updates rather than requiring a pull. That way your application would get notified when underlying data changed and you would be able to respond quickly. If that doesn't work however, you want to concoct your query to minimize the number of DB library and I/O calls. For example if you execute a sequence of select statements your results should have all the appropriate data in the order you requested it. You just have to keep track of what the corresponding select statements were. Alternately if you can use a looser query criteria so that it returns more than one row for your simple query that ought to help performance as well.

Optimisation tips when migrating data into Sitecore CMS

I am currently faced with the task of importing around 200K items from a custom CMS implementation into Sitecore. I have created a simple import page which connects to an external SQL database using Entity Framework and I have created all the required data templates.
During a test import of about 5K items I realized that I needed to find a way to make the import run a lot faster so I set about to find some information about optimizing Sitecore for this purpose. I have concluded that there is not much specific information out there so I'd like to share what I've found and open the floor for others to contribute further optimizations. My aim is to create some kind of maintenance mode for Sitecore that can be used when importing large columes of data.
The most useful information I found was on Mark Cassidy's blogpost http://intothecore.cassidy.dk/2009/04/migrating-data-into-sitecore.html. At the bottom of this post he provides a few tips for when you are running an import.
If migrating large quantities of data, try and disable as many Sitecore event handlers and whatever else you can get away with.
Use BulkUpdateContext()
Don't forget your target language
If you can, make the fields shared and unversioned. This should help migration execution speed.
The first thing I noticed out of this list was the BulkUpdateContext class as I had never heard of it. I quickly understood why as a search on the SND forum and in the PDF documentation returned no hits. So imagine my surprise when i actually tested it out and found that it improves item creation/deletes by at least ten fold!
The next thing I looked at was the first point where he basically suggests creating a version of web config that only has the bare essentials needed to perform the import. So far I have removed all events related to creating, saving and deleting items and versions. I have also removed the history engine and system index declarations from the master database element in web config as well as any custom events, schedules and search configurations. I expect that there are a lot of other things I could look to remove/disable in order to increase performance. Pipelines? Schedules?
What optimization tips do you have?
Incidentally, BulkUpdateContext() is a very misleading name - as it really improves item creation speed, not item updating speed. But as you also point out, it improves your import speed massively :-)
Since I wrote that post, I've added a few new things to my normal routines when doing imports.
Regularly shrink your databases. They tend to grow large and bulky. To do this; first go to Sitecore Control Panel -> Database and select "Clean Up Database". After this, do a regular ShrinkDB on your SQL server
Disable indexes, especially if importing into the "master" database. For reference, see http://intothecore.cassidy.dk/2010/09/disabling-lucene-indexes.html
Try not to import into "master" however.. you will usually find that imports into "web" is a lot faster, mostly because this database isn't (by default) connected to the HistoryManager or other gadgets
And if you're really adventureous, there's a thing you could try that I'd been considering trying out myself, but never got around to. They might work, but I can't guarantee that they will :-)
Try removing all your field types from App_Config/FieldTypes.config. The theory here is, that this should essentially disable all of Sitecore's special handling of the content of these fields (like updating the LinkDatabase and so on). You would need to manually trigger a rebuild of the LinkDatabase when done with the import, but that's a relatively small price to pay
Hope this helps a bit :-)
I'm guessing you've already hit this, but putting the code inside a SecurityDisabler() block may speed things up also.
I'd be a lot more worried about how Sitecore performs with this much data... assuming you only do the import once, who cares how long that process takes. Is this going to be a regular occurrence?

Best way to keep the user-interface up-to-date?

This question is a refinement of my question Different ways of observing data changes.
I still have a lot of classes in my C++ application, which are updated (or could be updated) frequently in complex mathematical routines and in complex pieces of business logic.
If I go for the 'observer' approach, and send out notifications every time a value of an instance is changed, I have 2 big risks:
sending out the notifications itself may slow down the applications seriously
if user interface elements need to be updated by the change, they are updated with every change, resulting in e.g. screens being updated thousends of times while some piece of business logic is executing
Some problems may be solved by adding buffering-mechanisms (where you send out notifications when you are going to start with an algorith, and when the algorithm is finished), but since the business logic may be executed on many places in the software, we end up adding buffering almost everywhere, after every possible action chosen in the menu.
Instead of the 'observer' aproach, I could also use the 'mark-dirty' approach, only marking the instances that have been altered, and at the end of the action telling the user interface that it should update itself.
Again, business logic may be executed from everywhere within the application, so in practice we may have to add an extra call (telling all windows they should update themselves) after almost every action executed by the user.
Both approaches seem to have similar, but opposite disadvantages:
With the 'observer' approach we have the risk of updating the user-interface too many times
With the 'mark-dirty' approach we have the risk of not updating the user-interface at all
Both disadvantages could be solved by embedding every application action within additional logic (for observers: sending out start-end notifications, for mark-dirty: sending out update-yourself notifications).
Notice that in non-windowing applications this is probably not a problem. You could e.g. use the mark-dirty approach and only if some calculation needs the data, it may need to do some extra processing in case the data is dirty (this is a kind of caching approach).
However, for windowing applications, there is no signal that the user is 'looking at your screen' and that the windows should be updated. So there is no real good moment where you have to look at the dirty-data (although you could do some tricks with focus-events).
What is a good solution to solve this problem? And how have you solved problems like this in your application?
Notice that I don't want to introduce windowing techniques in the calculation/datamodel part of my application. If windowing techniques are needed to solve this problem, it must only be used in the user-interface part of my application.
Any idea?
An approach I used was with a large windows app a few years back was to use WM_KICKIDLE. All things that are update-able utilise a abstract base class called IdleTarget. An IdleTargetManager then intercepts the KICKIDLE messages and calls the update on a list of registered clients. In your instance you could create a list of specific targets to update but I found the list of registered clients enough.
The only gotcha I hit was with a realtime graph. Using just the kick idle message it would spike the CPU to 100% due to constant updating of the graph. Use a timer to sleep until the next refresh solved that problem.
If you need more assistance - I am available at reasonable rates...:-)
Another point I was thinking about.
If you are overwhelmed by the number of events generated, and possibly the extra-work it is causing, you may have a two phases approach:
Do the work
Commit
where notifications are only sent on commit.
It does have the disadvantage of forcing to rewrite some code...
You could use the observer pattern with coalescing. It might be a little ugly to implement in C++, though. It would look something like this:
m_observerList.beginCoalescing();
m_observerList.notify();
m_observerList.notify();
m_observerList.notify();
m_observerList.endCoalescing(); //observers are notified here, only once
So even though you call notify three times, the observers aren't actually notified until endCoalescing when the observers are only notified once.

Is there a database implementation that has notifications and revisions?

I am looking for a database library that can be used within an editor to replace a custom document format. In my case the document would contain a functional program.
I want application data to be persistent even while editing, so that when the program crashes, no data is lost. I know that all databases offer that.
On top of that, I want to access and edit the document from multiple threads, processes, possibly even multiple computers.
Format: a simple key/value database would totally suffice. SQL usually needs to be wrapped, and if I can avoid pulling in a heavy ORM dependency, that would be splendid.
Revisions: I want to be able to roll back changes up to the first change to the document that has ever been made, not only in one session, but also between sessions/program runs.
I need notifications: each process must be able to be notified of changes to the document so it can update its view accordingly.
I see these requirements as rather basic, a foundation to solve the usual tough problems of an editing application: undo/redo, multiple views on the same data. Thus, the database system should be lightweight and undemanding.
Thank you for your insights in advance :)
Berkeley DB is an undemanding, light-weight key-value database that supports locking and transactions. There are bindings for it in a lot of programming languages, including C++ and python. You'll have to implement revisions and notifications yourself, but that's actually not all that difficult.
It might be a bit more power than what you ask for, but You should definitely look at CouchDB.
It is a document database with "document" being defined as a JSON record.
It stores all the changes to the documents as revisions, so you instantly get revisions.
It has powerful javascript based view engine to aggregate all the data you need from the database.
All the commits to the database are written to the end of the repository file and the writes are atomic, meaning that unsuccessful writes do not corrupt the database.
Another nice bonus You'll get is easy and flexible replication and of your database.
See the full feature list on their homepage
On the minus side (depending on Your point of view) is the fact that it is written in Erlang and (as far as I know) runs as an external process...
I don't know anything about notifications though - it seems that if you are working with replicated databases, the changes are instantly replicated/synchronized between databases. Other than that I suppose you should be able to roll your own notification schema...
Check out ZODB. It doesn't have notifications built in, so you would need a messaging system there (since you may use separate computers). But it has transactions, you can roll back forever (unless you pack the database, which removes earlier revisions), you can access it directly as an integrated part of the application, or it can run as client/server (with multiple clients of course), you can have automatic persistency, there is no ORM, etc.
It's pretty much Python-only though (it's based on Pickles).
http://en.wikipedia.org/wiki/Zope_Object_Database
http://pypi.python.org/pypi/ZODB3
http://wiki.zope.org/ZODB/guide/index.html
http://wiki.zope.org/ZODB/Documentation