I'm to write an event correlator. A fundamental part of the system will be a huge decision tree that recognizes the origin of the fault basing on recorded states and log files, and one of the primary concerns was keeping that tree maintainable - written in a format easily understandable and editable for the programmer.
Since 7-levels-deep nested if()s is not my idea of "maintainable and easy to understand", I asked for ideas how to represent it in a form that is a good middle ground between machine-friendly, user-friendly and cost-efficient. The obvious answer was using a Domain-specific language that would be compilable to C++ in which the actual event correlator will be written. The obvious question was how that DSL should look like.
The suggestion I liked best was to use UML activity diagram, and have it compiled to C++. The diagram would likely consist almost strictly of decisions, with activities only at leaves of the tree, as conclusions reached by the decision process. In essence, the diagram is my graphical DSL, which should be then compiled into that huge bunch of if()s in C++. And while I'll still need to craft all conditional functions by hand, at least the interconnections between the conditions should get handled by the system.
Now, what tool should I use for creating that diagram?
Since "roll your own" isn't my idea of cost-efficient, considering it is to ultimately create one, single diagram for one device (even if it will likely be edited forever, as new modes of failure are discovered), I had a look at the List of Unified Modeling Language tools.
Quite a few of these, including these that have "C++" listed in the "Languages generated" but I know the reality is never that good - I'm not interested in a bunch of header files pre-filled with class definitions according to the class diagram. I need a file that contains my decision tree; a bunch of conditional statements with conditions pre-filled with decision functions calls which I'm to write by hand, and outcomes as specific conclusion function calls.
Now my question is which ones can do that, aren't overly difficult to use, and aren't expensive either - free tools preferred but reasonably priced commercial ones are fine too.
Alternatively, failing that - which ones can save that diagram in a form that I could parse with a self-made "compiler", and how to approach creating that compiler.
Of course other suggestions are most welcome too - maybe a tool for old-fashioned flow diagram that can generate such code? Maybe a dedicated DSL to create what I need exists already?
Enterprise Architect can generate C++ code from behavioral diagrams, including Activity Diagrams. It's offered in several editions; the lowest edition to support behavioral code generation costs $599. Here's the section of the user guide: Generate From Behavioral Models. Beyond code generation, EA offers simulation, traceability, and many other niceties.
If you can implement your logic in a Statechart instead, you can use the free QM Modeler. It generates C++ code. It's designed to work with the QP active object framework, but you can use QM without relying on QP. (You can also use Enterprise Architect to generate code from Statecharts.)
This URL states that UML is represented as "an XMI format" - a kind of XML based
standard for representing UML.
http://documentation.softwareag.com/webmethods/tamino/ins441/advconc/FromUMLtoXML.htm
If you were to use this standard, your data might be more compatible with the
other CASE tools:
Most UML tools provide a function to serialize a model into XMI format. XMI is an XML-based industry standard for the exchange of metadata between CASE tools. Because it is XML based, XMI can be converted with the help of XSLT stylesheets into other formats such as XML Schema. An example of such a stylesheet can be found at http://www.aomodeling.org/.
I am guessing this XML could be parsed with an ordinary C++ XML parser such as
Xerces or (for Windows) MSXML /XML DOM.
Related
I am working on modeling GPB messages and inter-app communication using MagicDraw's sequence diagramming.
Ultimately, I want to use MagicDraw's C++ code generation tool to export the models into C++ code, and then turn that into .proto files. The goal is to turn the diagrammed models from MagicDraw and convert them into .proto files.
I've spent 5+ hours looking for ways to do this, but it looks like code conversion only works from writing .proto files and then using protoc to turn them into C++, not the other way around.
Is there any way to reverse this process?
Ah, the quest for the hallowed model<==>schema<==>code<==>model triangle.
I don't actually have a good answer I'm afraid. The closest I've got to this is Enterprise Architect, XSD, and ASN.1. But this is not complete; Enterprise Architect doesn't sync to XSD; it's a straight forward import/export, so changes made in XSD don't reflect back to Enterprise Architect. In EA, you can define classes (classic UML stuff), and export a package of classes as XSD, which the better ASN.1 tools will consume as a schema (there is an official translation between XSD and ASN.1 schema, and ASN.1 compilers tend to be better at implementing and enforcing things like minoccurs in generated code (a lot of XML/XSD compilers actually do a bad job of such things).
I've kind of given up looking. My suspicion is that tool developers will only do so if there is a market (or an enthusiastic user base willing to contribute donations for OSS). An awful lot of software gets developed without the use of schemas to define data structures (which leads to a large number of the world's buffer overruns, interface bugs, etc), never mind tools that sync code / schemas / models. So I think that there's precious few developers out there who see the actual value of such tooling, and so there's not that much enthusiasm for developing the tools in the first place. It's a classic chicken / egg situation.
Which is a pity. Using a strong schema system like ASN.1, or XSD (with a really good compiler) or perhaps at a pinch JSON (there seems to be more of an emphasis on using schema to check objects, rather than using scheme to define classes) leads to a very agile development path. Particularly with ASN.1, you can have all of a system's messages / data structures and their constraints and all of the system's constants defined once in a schema. You can then use the build system to propagate any changes to every corner of a project automatically. With this pattern embraced you can make late-breaking changes in system interfaces really quite easily and safely.
For my next project I would like to try UML modeling. There are several reason - mainly documentation + to break ground for development to avoid re-coding everything over and over again.
I've tried it several times in the past, but I had a feeling like without a deep knowledge of the background libraries my work will depend on, It's not a trivial task, as at the very beginning I don't know, what kind of member variables and functions I would need.
Usually I was coding to get familiar with the libraries and API my app was interface and I get into a state, where the work was almost done or let's say it was from 50% ready, where it made no sense to me to start modeling something.
Am I true you really need to understand well the background or there are ways/techniques how to overcome this?
Another point is, do you built up the model from bottom to top or from top to bottom or it depends on the use case?
Thank you for any recommendations how to proceed.
If I understand well, your main challenge is to get an understanding of the libraries and API that you are using.
If you intend to create an UML diagram for reverse-engineering the library and understand it, you might loose your time: You'd be able to make a meaningful model only once you've understood how the pieces fit together. And for this discovery and knowledge acquisition, you already use the most effective approach:
Usually I was coding to get familiar with the libraries and API my app was interfaced.
Now, if the library or the API is delivered with an UML model, it's another story: an existing design model (not all the details of the implementation, but the core elements of the design, and interaction scenario that are difficult to grasp from the code) could help you to grasp faster how the library works, which will help you to go faster through the exploratory phase.
It's also a different story when you are reverese-engineering an undocumented app: there you don't have a tutorial, and it's difficult to write code to use the existing elements in a meaninful way. There it could make sense to document the system post-mortem. But again, do not lose yourself in a detailed implementation model with all the details: focus on the core elements, whose understanding will really matter for your maintenance fellows.
The three main purposes of making UML class models when developing an app are:
Describing the entity types of the app's problem domain for analyzing and better understanding the requirements for the app in a conceptual (domain) model.
Designing the schema of the app's underlying database (this is typically an RDB schema defined with a bunch of CREATE TABLE statements).
Designing the model classes of the data model of your app, which will be coded, e.g., as Java Entity classes or C# classes with EF annotations.
For 1 and 2, you may take a look at my book An introduction to information modeling and databases, while for 3 you may check out a book on model-based development, e.g. for Java Backend Apps or JavaScript Frontend Apps.
If your goal is to model the dependencies of your app, this may indeed be another purpose. However, as argued by #Christope, reverse-engineering a library is itself a big project that may easily consume more time than you have for developing your app.
In my Clojure project, I am using Clojure Spec but If I need to use some lib like compojure-api then I need to use Schema.
What is the advantage one over the others?
Why would I consider one over the others?
Which one is good for compile type checked?
These are three merely different approaches to give the developer some type safety. All the three offer their own DSL to describe the schema/type of data but they are very different in philosophy. They are all actively maintained and have a nice community.
This is an opinionated overview based of my experiences.
core typed
core typed tries to extend the clojure language with additional macros to annotate functions and vars with static type information. It then uses static type analysis to ensure that the code matches the type info (that is it produces and consumes data of the right types).
Some advantages:
Static typing in general is a very strong tool. If you are familiar with staticly typed programming languages you will appreciate this very much.
Many bugs can be found during compilation time. No more NullPointerExceptions!
Some drawbacks:
Changing something in type or code may require extra work to propagate the changes through all parts of your code. And sometimes it is just too complicated to write type info or correct programs.
Static code checking will slow down your compile times and may slow down your development workflow.
Schema
In Schema you also write type annotations but type checking happens runtime. It encourages you to construct schema declarations dynamically and lets you specify where you want to check for schema and where you do not want its funcionality.
Some advantages:
Very friendly DSL to describe data schema.
Various tools. For example: Data generation for generative testing, tools to explain why a schema does not match.
Some disadvantages:
Only checks for schema where and when you tell it to do so.
External library, not supported by the core team.
Spec
Spec is the latest player with a philosophy borrowed from Racket lang. It is (going to be) part of the Clojure core library from the Clojure version 1.9.
The basic idea is to have entity types specified by the (namespaced) keys in a map object. Spec declarations are stored in the application's registry bound to namespaced keywords. Spec is very strong in sequence validation.
Some advantages:
Part of the Clojure core, not an external library. It is now used for parsing macro arguments and also for documentation purposes.
The community is very excited about it resulting in interesting ideas such as using spec in genetic programming and generative testing.
Some disadvantages:
Will be available in Clojure 1.9 which is not yet a released stable version. It is still a new technology not widely used.
Spec do not look like the data they are describing.
Personally, core.typed feels intimidating and core.spec feels immature so I use schema in production. My advice is the following:
If you need static type checking then core.typed is the way to go.
If you want to do parsing then core.spec is a nice choice.
If you want simple type descriptions then schema will be a good fit.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I've tried wedging my clojure diagrams into what's available in UML, using class-blocks as the file-level namespaces and dependency links to show relationships, but it's awkward and tends to discourage functional patterns. I've also tried developing ad-hoc solutions, but I can't discover a solution that works as well as UML with, say, Java (simple directed graphs seem to work in a vague manner, but this the results aren't detailed enough). Furthermore, I'm not finding anything on the web about this.
Just to be clear, I'm not trying to do anything fancy like code generation; I'm just talking about pen-and-paper diagrams mostly for my own benefit. I'm assuming I'm not the first person to have ever considered this for a lisp language.
What solutions have been proposed? Are there any commonly-used standards? What do you recommend? What tools do you use?
It depends on what you want to describe in your program.
Dependencies
Use class diagrams to model the dependencies between namespaces; in this case, it's more clear if you use packages instead of classes in a diagram.
You can also use class diagrams to model dependencies between actors
Data flow
You can also use Communication Diagrams to model the flow of data in your program. In this case, depict each namespace as an entity and each function as a method of that entity.
Or, in the case of actors, depict each actor as an entity and each message as a method.
In any case, it's not useful to try and describe the algorithm of your program in UML. In my experience, they are better described in comments in the source file.
I think its less about the language and more about your conceptual model. If you are taking a "stream processing" approach then a data-flow network diagram might be the right approach as in some of the Scheme diagrams in SICP. If you are taking a more object oriented approach (which is well supported in Lisp) then UML activity diagrams might make more sense.
My personal thought is to model the flow of the data and not the structure of the code because from what i'v seen of large(not really that large) Clojure projects the code layout tends to be really boring, with a huge pile of composeable utilities and one class that threads them together with map, redure, and STM transactions.
Clojure is very flexible in the model you choose and so you may want to go the other way around this. make the diagram first then choose the parts and patterns of the language that cleanly express the model you built.
Well, UML is deeply rooted in OO design (with C++!), so it will be very difficult to map a functional approach with UML. I don't know Clojure that well but you may be able to represent the things that resemble Java classes and interfaces (protocols?), for all the others it will be really hard.
FP is more like a series of transformations from input to output, there's no clear UML diagram for that (maybe activity diagrams?). The most common diagrams are for the static structure and the interaction between objects, but they aren't really useful for the FP paradigm.
Depending on your goal the component and deployment diagrams can be applicable.
I don't think something like UML would be a good fit for Clojure - UML is rather focused on the object oriented paradigm which is usually discouraged in Clojure.
When I'm doing functional programming I tend to think much more in terms of data and functions:
What data structures do I need? In Clojure this usually boils down to defining a map structure for each important entity I am dealing with. A simple list of fields is often enough in simple cases. In more complex cases with many different entities you will probably want to draw a tree showing the structure of your data (where each node in the tree represents a map or record type)
How do these data structures flow through different transformation functions to get the right result? Ideally these are pure functions that take an immutable value as input and produce an immutable value as output. Typically I sketch these as a pipeline / flowchart.
If you've thought through the above well enough, then converting to Clojure code is pretty easy.
Define one or more constructor functions for your data structures, and a write a couple of tests to prove they are working
Write the transformation functions bottom up (i.e. get the most basic operations working and tested first, then compose these together to define the larger functions). Write tests for every function.
If you need utility functions for GUI or IO etc., write them on demand as they are needed.
Glue it all together, testing at the REPL to make sure everything is working.
Note that you source files will typically also be structured in the sequence listed above, with more elementary functions at the top and the higher level composed functions towards the bottom. You shouldn't need any circular dependencies (that's a bad design smell in Clojure). Tests are critical - IMHO much more important in a dynamic language like Clojure than in a statically typed OOP language.
The overall logic of my code is usually the last few lines of my main source code file.
I have been wrestling with this as well. I find flow charts work great for basic functions and data. It's easy to show the data and data flow that way. Conditionals and recursion are straightforward. UML sequence/collaboration diagrams can capture some of the same info pretty well.
However, once you start using HOF, this does not work well at all.
Normal UML diagrams for packages work ok for namespaces, not that that does much.
Is there a tool which can handle model checking large, real-world, mostly-C++, distributed systems, such as KDE?
(KDE is a distributed system in the sense that it uses IPC, although typically all of the processes are on the same machine. Yes, by the way, this is a valid usage of "distributed system" - check Wikipedia.)
The tool would need to be able to deal with intraprocess events and inter-process messages.
(Let's assume that if the tool supports C++, but doesn't support other stuff that KDE uses such as moc, we can hack something together to workaround that.)
I will happily accept less general (e.g. static analysers specialised for finding specific classes of bugs) or more general static analysis alternatives, in lieu of actual model checkers. But I am only interested in tools that can actually handle projects of the size and complexity of KDE.
You're obviously looking for a static analysis tool that can
parse C++ on scale
locate code fragments of interest
extract a model
pass that model to a model checker
report that result to you
A significant problem is that everybody has a different idea about what model they'd like to check.
That alone likely kills your chance of finding exactly what you want, because each model extraction tool
has generally made a choice as to what it wants to capture as a model, and the chances that it matches
what you want precisely are IMHO close to zero.
You aren't clear on what specifically you want to model, but I presume you want to find the communication
primitives and model the process interactions to check for something like deadlock?
The commercial static analysis tool vendors seem like a logical place to look, but I don't think they are there, yet. Coverity would seem like the best bet, but it appears they only have some kind of dynamic analysis for Java threading issues.
This paper claims to do this, but I have not looked at in any detail: Compositional analysis of C/C++ programs
with VeriSoft. Related is [PDF] Computer-Assisted Assume/Guarantee Reasoning with VeriSoft. It appears you have to hand-annotate
the source code to indicate the modelling elements of interest. The Verifysoft tool itself appears to be proprietary to Bell Labs and is likely hard to obtain.
Similarly this one: Distributed Verification of Multi-threaded C++ Programs .
This paper also makes interesting claims, but doesn't process C++ in spite of the title:
Runtime Model Checking of Multithreaded C/C++ Programs.
While all the parts of this are difficult, an issue they all share is parsing C++ (as exemplified by
the previously quoted paper) and finding the code patterns that provide the raw information for the model.
You also need to parse the specific dialect of C++ you are using; its not nice that the C++ compilers all accept different languages. And, as you have observed, processing large C++ codes is necessary. Model checkers (SPIN and friends) are relatively easy to find.
Our DMS Software Reengineering Toolkit provides for general purpose parsing, with customizable pattern matching and fact extraction, and has a robust C++ Front End that handles many dialects of C++ (EDIT Feb 2019: including C++17 in Ansi, GCC and MS flavors). It could likely be configured to find and extract the facts that correspond to the model you care about. But it doesn't do this this off the shelf.
DMS with its C front end have been used to process extremely large C applications (19,000 compilation units!). The C++ front end has been used in anger on a variety of large-scale C++ projects (EDIT Feb 2019: including large scale refactoring of APIs across 3000+ compilation units). Given DMS's general capability, I think it likely capable of handling fairly big chunks of code. YMMV.
Static code analyzers when used against large code base first time usually produce so many warnings and alerts that you won't be able to analyze all of them in reasonable amount of time. It is hard to single out real problems from code that just look suspicious to a tool.
You can try to use automatic invariant discovery tools like "Daikon" that capture perceived invariants at run time. You can validate later if discovered invariants (equivalence of variables "a == b+1" for example) make sense and then insert permanent asserts into your code. This way when invariant is violated as result of your change you will get a warning that perhaps you broke something by your change. This method helps to avoid restructuring or changing your code to add tests and mocks.
The usual way of applying formal techniques to large systems is to modularise them and write specifications for the interfaces of each module. Then you can verify each module independently (while verifying a module, you import the specifications - but not the code - of the other modules it calls). This approach makes verification scalable.