Is Process Mining used just to infer business process models? - data-mining

I have been searching about mining event logs (Process Mining). I wonder if there are other uses besides infering the process model (eg. improving the process). Until now I haven't found any other practical application. Can someone recommend me authors, publications about it (if there is other application), or recommend any keywords that I can search for to find it. Thank you!

Please take a look at this twitter thread: https://twitter.com/JorgeMunozGama/status/1236967153825275904
Many interesting applications from soccer analyzing to wind turbines monitoring.

I would suggest having a look at this wonderful book: https://www.springer.com/gp/book/9783662498507
It gives a detailed understanding of process mining and its applications.

Three alternative uses of process mining other than creating business models are:
Discovering patient pathways (a patient moving between different healthcare providers or different departments in a hospital). This information may also be relevant for parties that are not healthcare providers themselves, for example the insurer. This can also help with fraud detection. For example, if the process map shows that procedure X (for example an x-ray) is usually followed by procedure Y (for example a knee operation), and the insurer finds that in certain cases procedure Y is done but X is missing, that may be indicative of a type of fraud. In this example, process mining can easily identify all the cases that had an x-ray but never showed op for a knee operation.
Discovering networks (who refers work to who and at what intensity). In this case you do not use the product for the 'activity' column in the event log, but instead label the name of the provider as the 'activity' column in the event log. This is also known as a 'work hand over' map. It is slightly different than a regular process map because it does not visualize activities anymore, but instead visualizes the flow between key players.
Process mining allows for exact quantifications of throughput times and bottlenecks, which regular BPMN models can not do.

Process mining can be used to obtain process models, even if there are no event logs to be mined.
See the BPMN Sketch Miner for how to do so.

Related

Is there a tool & tip & method for existing system analysis

I have received a legacy code from my company's previous project without any document and description left. The only part of these code I can recognize is Jetty for API. I can't even find the database it use yet 😛
Miserably, I probably need to do some modification of this system.
Is there a way&tools for figuring the component and relationship of this system? I mean like modeling a causal loop diagram or something by monitoring data interaction through running process and IPC etc.
I have used some previous knowledge of regular website forming langue (HTML) to locate particular syntax of code for modification.
but there should be a more general mythology and tools for analyzing the dynamic of cooperating process.

Business Process Model Versus Process Flow Chart

What are the differences between designing a process using Business Process Model OR using Process Flow Chart ?
What are the differences between the two, are these two design tools similar?
The Process Flow Chart originated in 1921 and is pretty simple with 5 symbols:
Operation: is to change the physical or chemical characteristics of the material.
Inspection: is to check the quality or the quantity of the material.
Move: is transporting the material from one place to another.
Delay: is when material cannot go to the next activity.
Storage: is when the material is kept in a safe location.
https://en.wikipedia.org/wiki/Flow_process_chart
Business Process is very similar, and a little more specific. If you want something very structured, look at Business Process Model and Notation (BPMN).
The BPMN 1.0 standard was established in 2004 (https://en.wikipedia.org/wiki/Business_Process_Model_and_Notation)
The BPMN 2.0 standard was revised in 2011 (http://www.omg.org/spec/BPMN/2.0/)
I'd recommend grabbing a cheat sheet for the BPMN sybmols and rules. It's my preferred system when mapping a process involving more than one user.

How do you model a business workflow in ColdFusion?

Since there's no complete BPM framework/solution in ColdFusion as of yet, how would you model a workflow into a ColdFusion app that can be easily extensible and maintainable?
A business workflow is more then a flowchart that maps nicely into a programming language. For example:
How do you model a task X that follows by multiple tasks Y0,Y1,Y2 that happen in parallel, where Y0 is a human process (need to wait for inputs) and Y1 is a web service that might go wrong and might need auto retry, and Y2 is an automated process; follows by a task Z that only should be carried out when all Y's are completed?
My thoughts...
Seems like I need to do a whole lot of storing / managing / keeping
track of states, and frequent checking with cfscheuler.
cfthread ain't going to help much since some tasks can take days
(e.g. wait for user's confirmation).
I can already image the flow is going to be spread around in multiple UDFs,
DB, and CFCs
any opensource workflow engine in other language that maybe we can port over to CF?
Thank you for your brain power. :)
Study the Java Process Definition Language specification where JBoss has an execution engine for it. Using this Java based engine may be your easiest solution, and it solves many of the problems you've outlined.
If you intend to write your own, you will probably end up modelling states and transitions, vertices and edges in a directed graph. And this as Ciaran Archer wrote are the components of a State Machine. The best persistence approach IMO is capturing versions of whatever data is being sent through workflow via serialization, capturing the current state, and a history of transitions between states and changes to that data. The mechanism probably needs a way to keep track of who or what has responsibility for taking the next action against that workflow.
Based on your question, one thing to consider is whether or not you really need to represent parallel tasks in your solution. Where instead it might be possible to en-queue a set of messages and then specify a wait state for all of those to complete. Representing actual parallelism implies you are moving data simultaneously through several different processes. In which case when they join again you need an algorithm to resolve deltas, which is very much a non trivial task.
In the context of ColdFusion and what you're trying to accomplish, a scheduled task may be necessary if the system you're writing needs to poll other systems. Consider WDDX as a serialization format. JSON, while seductively simple, I recall has some edge cases around numbers and dates that can cause you grief.
Finally see my answer to this question for some additional thoughts.
Off the top of my head I'm thinking about the State design pattern with state persisted to a database. Check out the Head First Design Patterns's Gumball Machine example.
Generally this will work if you have something (like a client / order / etc.) going through a number of changes of state.
Different things will happen to your object depending on what state you are in, and that might mean sitting in a database table waiting for a flag to be updated by a user manually.
In terms of other languages I know Grails has a workflow module available. I don't know if you would be better off porting to CF or jumping ship to Grails (right tool for the job and all that).
It's just a thought, hope it helps.

What is data mining from a developer's perspective?

I can find the technical explanation of what data mining is in a book or on Wikipedia, but I'm wondering what sort of development does it exactly involve? Is it more about using tools or more about writing tools? Is it really any much different from other domains when it comes to R&D?
Data Mining is the process of discovering interesting patterns in large amounts of data. It is not querying data, which is just what user Treb describes (sorry Treb).
To understand DM from a developer's perspective, you should read the book Programming Collective Intelligence by Toby Segaran.
In my experience (I'm a former data miner :-)), it's a mixture of using tools and writing tools. A lot of the time, the tools you need to analyse the particular data set don't exist, so you have to write them yourself first. It can be very interesting but you often need quite a different approach to the sort of programming I do now (embedded wireless), for example.
You really ought to change the accepted answer on this question so it doesn't mislead those who come across it.
Saying that querying a database IS data mining because "[h]ow would you discover any pattern in your data without querying first?" is like saying opening your car door is driving because "how else would you be able to drive somewhere without opening the car door first."
You can read your data out of a text file if you want. My first data mining assignment used data sets from the UCI repository and those are almost all text files.
If you want to learn about data mining start by looking up clustering and classification. Learn about decision trees and rule based classification. Then look at k-nearest-neighbor and k-means. After that if you really want to see what data mining is all about look at Chameleon, DBScan, and Support Vector Machines. Don't necessarily learn the minutiae of the last three (they're pretty complex and math heavy) but understanding the abstract idea of what happens will tell you all you need to know in order to use the many tools and libraries that are available for each strategy.
These are only the algorithms that popped into my head just now. There are so many others that I don't recall or don't even know yet.
Data mining is about searching large quantities of data for hidden patterns. Web 2.0 example: News corp uses its site myspace.com as a large data mine to determine what movies and products to promote. They write software to identify trends in the data that it's users post to the site. News corp does this to gather information useful for advertising campaigns and market predictions. It's different from other domains of R&D in that from a data givers perspective its passive. Rather than going out on the street and asking people in person what movies they are likely to see this summer and other such questions, the data mining tools sort out these things by analyzing data given by users voluntarily.
Wikipedia actually does have a pretty good article on it:
- http://en.wikipedia.org/wiki/Data_mining
Data Mining as I say is finding patterns or trends from given data. A developer perspective might be in applications like Anti Money Laundring... Where given a pattern you will search data for that given pattern. One other use is in Projection Softwares... where you project a result or outcome in future against a heuristic by studying recognizing the current trend from data.
I think it's more about using off the shelf tools rather than developing your own. An academic example of that kind of tools might be WEKA. Of course, you still have to know what algorithms use, how to preprocess data (very important this part), etc.
In R&D I don't have much idea, but it should be like almost everything: maths, statistics, more maths...
On the development level, data mining is just another database application, but with a huge amount of data.
The mining itself is done by running specific queries on the database. It's in the creation of the queries where the important work is done. They of course depend on the data model, and on the hypotheses, what sort of trends the customer expects to find.
Therefore, the fine tuning of the queries usually can't be done in development, but only once the system is live and you have live data. Then the user can test his hypotheses and adapt the queries to show him the trends he is looking for.
So from a dev point of view, data maining is about
Managing large sets of data in your client (one query may return 100.000 rows of data)
Providing the user (who may know nothing about SQL or relational databases in general) with an effective way to modify his queries and view the results.

What is the difference between a data flow diagram and a flow chart?

I want to know why we use Data Flow Diagrams instead of flow charts.
A flow chart details the processes to follow. A DFD details the flow of data through a system.
In a flow chart, the arrows represent transfer of control (not data) between elements and the elements are instructions or decision (or I/O, etc).
In a DFD, the arrows are actually data transfer between the elements, which are themselves parts of a system.
Wikipedia has a good article on DFDs here.
You should use whatever you like. The diagram is just a tool. Use whatever tool fits you and your problem best. I usually just use boxes and arrows and squiggles and circles and little stick figures and whatever else I think gets the point across to the viewer. In short it doesn't matter if you even use a standard diagraming standard. People are usually pretty good at understanding pictures.
Data flow diagram shows the flow of data between the different entities and datastores in a system while a flow chart shows the steps involved to carried out a task. In a sense, data flow diagram provides a very high level view of the system, while a flow chart is a lower level view (basically showing the algorithm).
Whether you use data flow diagram or flow charts depends on figuring out what is it that you are trying to show.
The difference between a data flow diagram (DFD) and a flow chart (FC) are that a data flow diagram typically describes the data flow within a system and the flow chart usually describes the detailed logic of a business process.
Data Flow and Flow Chart differ in processes, flow, and timing.
Processes
a.) On DFDs, processes can operate in parallel (at-the-same-time).
b.) On flowcharts, processes execute one at a time.
Flow
a.) DFDs show the flow of data through a system
b.) Flowcharts show the flow of control (sequence and transfer of control)
Timing
a.) Processes on a DFD can have dramatically different timing (daily, weekly, on demand)
b.)
Processes on flowcharts are part of a single program with consistent timing
A DFD shows how the data moves through a system, a flowchart is closer to the operations that system does.
In the classic make a cup of tea example, a DFD would show where the water, tea, milk, sugar were going, whereas the flowchart shows the process.
Other answers have gone over the basics of what each thing is. At the higher level, a flowchart is a design level tool, while DFDs are more analysis.
DFDs have some nice features. Since they show the flow of data, some things become more obvious when charted this way: some data is only used by a few routines, some routines use only some bits of data, some routines touch everything. Seeing that up front helps organize, restructuring, and planning.
A follow-on worth exploring is the Event-Response Diagram, which is basically a DFD only showing process and data needed to process an "event", meaning something triggered externally (customer makes payment, etc.).
A Data Flow Diagram is functional relationship which includes input values and output values
and internal data stored.
A Flow Chart is a process relationship which includes input and output values.
Flow chart describes the program (see old fortran flow charts - surely, there are some floating around on google).
Data flow diagram determines the flow of data, for example, between subroutines, or between different programs.
Although my experience with DFD diagrams is limited I can tell you that a DFD shows you how the data moves (flows) between the various modules. Furthermore a DFD can be partitioned in levels, that is in the Initial Level you see the system (say, a System to Rent a Movie) as a whole (called the Context Level). That level could be broken down into another Level that contains activities (say, rent a movie, return a movie) and how the data flows into those activities (could be a name, number of days, whatever). Now you can make a sublevel for each activity detailing the many tasks or scenarios of those activities. And so on, so forth. Remember that the data is always passing between levels.
Now as for the flowchart just remember that a flowchart describes an algorithm!
have a look to this site
http://yourdon.com/strucanalysis/wiki/index.php?title=Chapter_9#The_Flow
its really help u to understand what is DFD
Between the above answers its been explained but I will try to expand slightly...
The point about the cup of tea is a good one. A flow chart is concerned with the physical aspects of a task and as such is used to represent something as it is currently. This is useful in developing understanding about a situation/communication/training etc etc..You will likley have come across these in your work places, certainly if they have adopted the ISO9000 standards.
A data flow diagram is concerned with the logical aspects of an activity so again the cup of tea analogy is a good one. If you use a data flow diagram in conjunction with a process flow your data flow would only be concerned with the flow of data/information regarding a process, to the exclusion of the physical aspects. If you wonder why that would be useful then its because data flow diagrams allow us to move from the 'as it is' situation and see it that something as it could/will be. These two modelling approaches are common in structured analysis and design and typically used by systems/business analysts as part of business process improvement/re-engineering.
Data flow diagram: A modeling notation that represents a functional decomposition of a system.
Flow chart: Step by step flow of a programe.
Data Flow Diagrams
The formal, structured analysis approach employs the data-flow diagram (DFD) to assist in the functional decomposition process. I learned structured analysis techniques from DeMarco [7], and those techniques are representative of present conventions. To summarize, DFD's are comprised of four components: