Separation of Runtime and History - camunda

I whould like to use separate databases for runtime and history data without implementing a custom HistoryEventHandler. Does someone know how this is possible?
I read the camunda user guides but this did not help much because it only hints the custom implementation way.
Currently, everytime I query history data (about 2mil activity entries) the performance of the system drops as it kind of blocks the runtime, too. I'd like to avoid this without loosing the ability to query historic data.

That would be a really cool feature, but it is currently not supported. You will have to disable the default history and implement a custom handler.

Camunda BPM offers Optimize, which pulls the history data from the Engine to an Elastic Search database. If you are using the Enterprise version, it may be a way to solve it.

(Based on your comments to other answers, it appears that you're interested in learning more about custom HistoryEventHandler implementations. Thus, I'm adding this answer in the hope that it will help.)
Implementing a custom History Event Handler isn't difficult, but there are a few important points to keep in mind:
Unless you want to skip the storage of history information in the standard Camunda history tables, you'll want to use their CompositeHistoryEventHandler. This simply gives you the ability to use multiple HistoryEventHandler implementations.
Any HistoryEventHandler implementations will complete in the same threads as the ones executing process instances; thus, you will want to be cognizant of the performance impacts your custom HistoryEventHandler will have.
You may want to consider publishing your history events through a message bus or messaging system to allow for reliable delivery without impacting Camunda workflow instance performance.
Finally, it may make sense to use your custom HistoryEventHandler along with Camunda's default HistoryEventHandler and their functionality for deleting process instances after a period of time. This would allow you to use their querying capabilities for some period of time without having the history stack up (and thus slowing down your system).

Related

How to profile an openedge database?

Is there a Progress profiling tool that allows me to see the queries executing against an OpenEdge database?
We're doing a migration from an OpenEdge database into a SQL database. In order to map the data correctly we'd like to run certain application reports on the OpenEdge database and see what database queries are being executed to retrieve the data.
Is this possible with some kind of Progress profiling tool (a la SQL Server Profiling)? Preferably free...
Progress is record oriented, not set oriented like SQL, so your reports aren't a single query or a set of queries, it is more likely a lot of record lookups combined with what you'd consider query-like operations.
Depending on the version you're running, there is a way to send a signal to the client to see what it is currently doing, however doing so will almost certainly not give you enough information to discern what's going on "under the hood."
Long story short, your options are to get a Dataserver product so you can attach the Progress client to an SQL database - this will enable you to use an SQL database w/out losing the Progress functionality. The second option is to get a copy of the program's source code to find out how the reports are structured.
Tim is quite right -- without the source code, looking at the queries is unlikely to provide you with much insight.
None the less there are some tools and capabilities that will provide information about queries. Probably the most useful for your purpose would be to specify something similar to:
-logentrytypes QryInfo -logginglevel 3 -clientlog "mylog.log"
at session startup.
You can use session triggers to identify almost anything done by any program, without modifying or having access to the source of those programs. Setting this up may be more work than is worth it for your purpose. We have a testing system built around this idea. One big flaw: triggers cannot be fired for CAN-FIND.

C++ Usage Statistics

I have made an application in C++ and would like to know how to go about implementing a usage statistics system so that I may gather some data regarding how users use the program.
Eg. IP Address, Number of hours spent in application, and OS used.
In theory I know I can code this myself if I must, but I was wondering if there is a framework available to make this easier to do. Unfortunately I was unable to find anything on google.
Though there is no any kind of such framework, you could reduce the work you have to do (in order to retrieve all these information) by using some approaches and techniques, which I tried describe below. Please, anybody feel free to correct me.
Let's summarise, what groups of information do we need to complete the task:
User Environment Information. I suggest you to look at Microsoft's WMI infrastructure, in particular to WMI classes: Desktop, File System, Networking, etc. Using this classes in your application can help you retrieve almost all kind of system information. But if you don't satisfy with this, see #2.
Application and System Performance. Under these terms I mean overall system performance, processor's count, processes running in OS, etc. To retrieve these data you can use the NtQuerySystemInformation function. With its help, you will get an access to detailed SystemProcessInformation, SystemProcessorPerformanceInformation (retrieves info about each processor) information, and much more.
User Related Information. It's hard to find a framework to do such things, so I suggest you simply start writing code, having in mind your requirements:
counting how many times each button was pressed, each text field was changes, etc.
measuring delay time between consecutive actions in some kind of predefined sequences (for example, if you have a settings gui form and you expect from the user to fill very fast all required text fields, so using a time delay measuments can give you an information if the user acted as we expected from him or delayed after TextBox2 for a 5 minutes).
anything that could be interested to you.
So, how you could implement the last item (User Related Information) requirements? As for me, I'd do something like folowing (some may seem very hard to implement or too pointless):
- creating a kind of base Counter class and derive from it some controls (buttons, edits, etc).
- using a windows hooks for mouse or keybord while getting a child handle (to recognize a control, for example).
- using Callback class, which can do all "dirty" work (counting, measuring, performing additional actions).
You could store all this information either in a textfile or an SQLite database or there wherever you prefer.
I would recommend taking a look at DeskMetrics. This StackOverflow post summarizes the issue.
Building your own framework could take you months of development (apart from maintenance). With something like Trackerbird Software Analytics you can integrate a DLL with your app and start tracking in 30 minutes and you get all the cool real-time visualizations.
Disclaimer: I am affiliated with company.

How do you model a business workflow in ColdFusion?

Since there's no complete BPM framework/solution in ColdFusion as of yet, how would you model a workflow into a ColdFusion app that can be easily extensible and maintainable?
A business workflow is more then a flowchart that maps nicely into a programming language. For example:
How do you model a task X that follows by multiple tasks Y0,Y1,Y2 that happen in parallel, where Y0 is a human process (need to wait for inputs) and Y1 is a web service that might go wrong and might need auto retry, and Y2 is an automated process; follows by a task Z that only should be carried out when all Y's are completed?
My thoughts...
Seems like I need to do a whole lot of storing / managing / keeping
track of states, and frequent checking with cfscheuler.
cfthread ain't going to help much since some tasks can take days
(e.g. wait for user's confirmation).
I can already image the flow is going to be spread around in multiple UDFs,
DB, and CFCs
any opensource workflow engine in other language that maybe we can port over to CF?
Thank you for your brain power. :)
Study the Java Process Definition Language specification where JBoss has an execution engine for it. Using this Java based engine may be your easiest solution, and it solves many of the problems you've outlined.
If you intend to write your own, you will probably end up modelling states and transitions, vertices and edges in a directed graph. And this as Ciaran Archer wrote are the components of a State Machine. The best persistence approach IMO is capturing versions of whatever data is being sent through workflow via serialization, capturing the current state, and a history of transitions between states and changes to that data. The mechanism probably needs a way to keep track of who or what has responsibility for taking the next action against that workflow.
Based on your question, one thing to consider is whether or not you really need to represent parallel tasks in your solution. Where instead it might be possible to en-queue a set of messages and then specify a wait state for all of those to complete. Representing actual parallelism implies you are moving data simultaneously through several different processes. In which case when they join again you need an algorithm to resolve deltas, which is very much a non trivial task.
In the context of ColdFusion and what you're trying to accomplish, a scheduled task may be necessary if the system you're writing needs to poll other systems. Consider WDDX as a serialization format. JSON, while seductively simple, I recall has some edge cases around numbers and dates that can cause you grief.
Finally see my answer to this question for some additional thoughts.
Off the top of my head I'm thinking about the State design pattern with state persisted to a database. Check out the Head First Design Patterns's Gumball Machine example.
Generally this will work if you have something (like a client / order / etc.) going through a number of changes of state.
Different things will happen to your object depending on what state you are in, and that might mean sitting in a database table waiting for a flag to be updated by a user manually.
In terms of other languages I know Grails has a workflow module available. I don't know if you would be better off porting to CF or jumping ship to Grails (right tool for the job and all that).
It's just a thought, hope it helps.

At what point is it worth using a database?

I have a question relating to databases and at what point is worth diving into one. I am primarily an embedded engineer, but I am writing an application using Qt to interface with our controller.
We are at an odd point where we have enough data that it would be feasible to implement a database (around 700+ items and growing) to manage everything, but I am not sure it is worth the time right now to deal with. I have no problems implementing the GUI with files generated from excel and parsed in, but it gets tedious and hard to track even with VBA scripts. I have been playing around with converting our data into something more manageable for the application side with Microsoft Access and that seems to be working well. If that works out I am only a step (or several) away from using an SQL database and using the Qt library to access and modify it.
I don't have much experience managing data at this level and am curious what may be the best way to approach this. So what are some of the real benefits of using a database if any in this case? I realize much of this can be very application specific, but some general ideas and suggestions on how to straddle the embedded/application programming line would be helpful.
This is not about putting a database in an embedded project. It is also not a business type application where larger databases are commonly used. I am designing a GUI for a single user on a desktop to interface with a micro-controller for monitoring and configuration purposes.
I decided to go with SQLite. You can do some very interesting things with data that I didn't really consider an option when first starting this project.
A database is worthwhile when:
Your application evolves to some
form of data driven execution.
You're spending time designing and
developing external data storage
structures.
Sharing data between applications or
organizations (including individual
people)
The data is no longer short and
simple.
Data Duplication
Evolution to Data Driven Execution
When the data is changing but the execution is not, this is a sign of a data driven program or parts of the program are data driven. A set of configuration options is a sign of a data driven function, but the whole application may not be data driven. In any case, a database can help manage the data. (The database library or application does not have to be huge like Oracle, but can be lean and mean like SQLite).
Design & Development of External Data Structures
Posting questions to Stack Overflow about serialization or converting trees and lists to use files is a good indication your program has graduated to using a database. Also, if you are spending any amount of time designing algorithms to store data in a file or designing the data in a file is a good time to research the usage of a database.
Sharing Data
Whether your application is sharing data with another application, another organization or another person, a database can assist. By using a database, data consistency is easier to achieve. One of the big issues in problem investigation is that teams are not using the same data. The customer may use one set of data; the validation team another and development using a different set of data. A database makes versioning the data easier and allows entities to use the same data.
Complex Data
Programs start out using small tables of hard coded data. This evolves into using dynamic data with maps, trees and lists. Sometimes the data expands from simple two columns to 8 or more. Database theory and databases can ease the complexity of organizing data. Let the database worry about managing the data and free up your application and your development time. After all, how the data is managed is not as important as to the quality of the data and it's accessibility.
Data Duplication
Often times, when data grows, there is an ever growing attraction for duplicate data. Databases and database theory can minimize the duplication of data. Databases can be configured to warn against duplications.
Moving to using a database has many factors to be considered. Some include but are not limited to: data complexity, data duplication (including parts of the data), project deadlines, development costs and licensing issues. If your program can run more efficiently with a database, then do so. A database may also save development time (and money). There are other tasks that you and your application can be performing than managing data. Leave data management up to the experts.
What you are describing doesn't sound like a typical business application, and many of the answers already posted here assume that this is the kind of application you are talking about, so let me offer a different perspective.
Whether or not you use a database for 700 items is going to depend greatly on the nature of the data.
I would say that, about 90% of the time at this scale, you will benefit from a light-weight database like SQLite, provided that:
The data may potentially grow substantially larger than what you are describing,
The data may be shared by more than one user,
You may need to run queries against the data (which I don't think you're doing right now), and
The data can easily be described in table form.
The other 10% of the time, your data will be highly structured, hierarchical, object-based, and doesn't neatly fit into the table model of a database or Excel table. If this is the case, consider using XML files.
I know developers instinctively like to throw databases at problems like this, but if you are currently using Excel data to design user interfaces (or display configuration settings), rather than display a customer record, XML may be a better fit. XML is more expressive than either Excel or database tables, and can be easily manipulated with a simple text editor.
XML parsers and data binders for C++ are easy to find.
I recommend you to introduce a Database in your app, your application will gain flexibility and will be easier to maintain and to improve with new features in the future.
I would start with a lightweight file based db like Sqlite.
With a well designed db you'll have:
Reduced data redundancy
Greater data integrity
Improved data security
Last but not least, using a database will save you from the Excel import/update/export Hell!
Reasons for using a database:
Concurrent writes. It's easy to achieve concurrency in databases
Easy querying. SQL queries tend to be much concise than procedural code to search data. UPDATEs, INSERT INTOs can also do lots of stuff with very little code
Integrity. Constraints are very easy to define and are enforced without writing code. If you have a non-null constraint, you can rest assured that the value won't be null, no need to write checks anywhere. If you have a foreign key constraint in place, you won't have "dangling references".
Performance over large datasets. Indexing is very simple to add to an SQL database
Reasons for not using a database:
It tends to be an extra dependency (although there exist very lightweight databases- I like H2 for Java, for instance)
Data not well suited to a relational schema. Things that are basically key/value maps. XML (although databases often support XPath, etc.).
Sometimes files are more convenient. They can be diff'ed, merged, edited with a plain text editor, etc. Sometimes spreadsheets can be more practical (you don't have to build an editor- you can use a spreadsheet program)
Your data is already somewhere else
When you have a lot of data that you're not sure how they will be exploited in the future.
For example you might want to add an SQLite database in an embedded application that need to register statistics that you're not sure how will be used. Later you send the full database for injection in a bigger one running on a central server and those data can easily be exploited, using requests.
In fact, if your application's purpose is to "gather data" then having a database is a must have.
I see quite a few requirements that well met by databases:
1). Ad hoc queries. Find me all the {X} that meet criteria Y
2). Data with structure that can benefit from normalisation - factoring out common values into separate "tables". You can save space and reduce the possibility of inconsistency this way. Once you've done this then those ad-hoc queries start to be really useful.
3). Large data volumes. Professional database are very good at making good use of resoruces, clever query optmisations and paging strategies. Trying to write this stuff yourself is a real challenge.
You're clearly not needing that last one, but the other two, maybe do apply to you.
Don't forget that the appropriate database can be quite different depending on your requirements (and don't forget that a text file could be used as a database if you're requirements are simple enough - for example, config files are just a specific kind of database). Such parameters might be:
number of records
size of data items
does the database need to be shared with other devices? Concurrently?
how complex are the relationships between the various pieces of data
is the database read only (created at build time and not changed, for example)?
does the database need to be updated by multiple entities concurrently?
do you need to support complex queries?
For a database with 700 entries, an in-memory sorted array loaded from a text file might well be appropriate. But I could also see the need for an embedded SQL database or maybe having the controller request data from the database over a network connection depending on what the various requirements (and resource limitations) are.
There isn't a specific point at which a database is worthwhile. Instead I usually ask the following questions:
Is the amount of data the application uses/creates growing?
Is the upper limit of this data growth unknown (or unclear)?
Will the application need to aggregate or filter this data?
Could there be future uses of the data that may not be obvious right now?
Is performance of data retrieval and/or storage important?
Are there (or could there be) multiple users of the application who share data?
If I answer 'Yes' to most of these questions I almost always choose a database (as opposed to other options such as XML/ini/CSV/Excel/text files or the filesystem).
Also, if the application will have many users who could be accessing the data concurrently, I'll lean towards a full database server (MySQL, SQl Server, Oracle etc).
But often in a single user (or small concurrency) situation, a local database such as SQLite cannot be beaten for portability and ease of deployment.
To add a negative: not suitable for real-time processing, due to non-deterministic latency. However, It would be quite ample for looking up and setting operating parameters, for instance during startup. I would not put database accesses on critical time paths.
You don't need a database if you have a few thousand rows in one or two tables to handle in a single user app (for the embedded point).
If it is for multiple users (concurrent access, locking) or the need of transactions you definitly should consider a database.
Handling complex datastructures in normalized tables and maintain integrety, or a huge amount of data would be another indication you should use a database.
It sounds like your application is running on a desktop computer and simply communicating to the embedded device.
As such using a database is much more feasible. Using one on an embedded platform is a much more complex issue.
On the desktop front I use a database when there is the need to store new information continuously and the need to extract that information in a relational way. What i don't use databases for is storing static information, information i read once at load and thats it. The exception is when the application has many users and there is the need to storage this information on a per user basis.
It sounds be to me like your collecting information from your embedded device, storing it somehow, then using it later to display via a GUI.
This is a good case for using a database, especially if you can architect the system such that there is a data collection daemon that manages the continuous communication with the embedded device. This app can then just write the data into the database. When the GUI is launched it can extract the data for display.
Using the database will also ease your GUI develop if you need to display different views, such as "show me all the entries between 2 dates". With a database you just ask it for the correct values to display with a proper SQL query and the GUI displays whatever comes back allowing you to decouple much of the "business logic" code from the GUI.
We are also facing a similar situation. We have set of data coming from different test setups and it is currently being dumped into excel sheets, processed using Perl or VBA.
We found out this method had lot of problems:
i. Managing data using excel sheets is quite cumbersome. After some time you have a whole lot of excel sheets and there is no easy way to retrieve required data from it.
ii. People start sending the excel sheets to and fro for comments and review through e-mails. E-Mail becomes the primary mode of managing the comments related to the data. These comments are lost at a later point of time and there is no way of retrieving it back.
iii. Multiple copies of the files get created and changes in one copy are not reflected in the other - there is no versioning.
This is for the same reasons we have decided to move to a database based solution and are currently working on it. Let me summaries what we are trying to do:
i. The database is in a central server accessible by PC in all the test setups.
ii. All the data goes into a temporary location (local hard disk in files) as soon as it is generated. From the files, it is pushed into database by a process running in the background (so even if there is a network problem, data will be present in the local files system).
iii. We have a web based application which allows users to log in and access data in the format they want. The portal will allow them to add comment, generate different kind of reports, share it with other users after review etc. It will also have the ability to export data into excel sheet, just in case you need to take it with you.
Let know if this can be better implemented.
"At what point is it worth using a database?"
If and when you've got data to manage ?

Is there a database implementation that has notifications and revisions?

I am looking for a database library that can be used within an editor to replace a custom document format. In my case the document would contain a functional program.
I want application data to be persistent even while editing, so that when the program crashes, no data is lost. I know that all databases offer that.
On top of that, I want to access and edit the document from multiple threads, processes, possibly even multiple computers.
Format: a simple key/value database would totally suffice. SQL usually needs to be wrapped, and if I can avoid pulling in a heavy ORM dependency, that would be splendid.
Revisions: I want to be able to roll back changes up to the first change to the document that has ever been made, not only in one session, but also between sessions/program runs.
I need notifications: each process must be able to be notified of changes to the document so it can update its view accordingly.
I see these requirements as rather basic, a foundation to solve the usual tough problems of an editing application: undo/redo, multiple views on the same data. Thus, the database system should be lightweight and undemanding.
Thank you for your insights in advance :)
Berkeley DB is an undemanding, light-weight key-value database that supports locking and transactions. There are bindings for it in a lot of programming languages, including C++ and python. You'll have to implement revisions and notifications yourself, but that's actually not all that difficult.
It might be a bit more power than what you ask for, but You should definitely look at CouchDB.
It is a document database with "document" being defined as a JSON record.
It stores all the changes to the documents as revisions, so you instantly get revisions.
It has powerful javascript based view engine to aggregate all the data you need from the database.
All the commits to the database are written to the end of the repository file and the writes are atomic, meaning that unsuccessful writes do not corrupt the database.
Another nice bonus You'll get is easy and flexible replication and of your database.
See the full feature list on their homepage
On the minus side (depending on Your point of view) is the fact that it is written in Erlang and (as far as I know) runs as an external process...
I don't know anything about notifications though - it seems that if you are working with replicated databases, the changes are instantly replicated/synchronized between databases. Other than that I suppose you should be able to roll your own notification schema...
Check out ZODB. It doesn't have notifications built in, so you would need a messaging system there (since you may use separate computers). But it has transactions, you can roll back forever (unless you pack the database, which removes earlier revisions), you can access it directly as an integrated part of the application, or it can run as client/server (with multiple clients of course), you can have automatic persistency, there is no ORM, etc.
It's pretty much Python-only though (it's based on Pickles).
http://en.wikipedia.org/wiki/Zope_Object_Database
http://pypi.python.org/pypi/ZODB3
http://wiki.zope.org/ZODB/guide/index.html
http://wiki.zope.org/ZODB/Documentation