In my Clojure project, I am using Clojure Spec but If I need to use some lib like compojure-api then I need to use Schema.
What is the advantage one over the others?
Why would I consider one over the others?
Which one is good for compile type checked?
These are three merely different approaches to give the developer some type safety. All the three offer their own DSL to describe the schema/type of data but they are very different in philosophy. They are all actively maintained and have a nice community.
This is an opinionated overview based of my experiences.
core typed
core typed tries to extend the clojure language with additional macros to annotate functions and vars with static type information. It then uses static type analysis to ensure that the code matches the type info (that is it produces and consumes data of the right types).
Some advantages:
Static typing in general is a very strong tool. If you are familiar with staticly typed programming languages you will appreciate this very much.
Many bugs can be found during compilation time. No more NullPointerExceptions!
Some drawbacks:
Changing something in type or code may require extra work to propagate the changes through all parts of your code. And sometimes it is just too complicated to write type info or correct programs.
Static code checking will slow down your compile times and may slow down your development workflow.
Schema
In Schema you also write type annotations but type checking happens runtime. It encourages you to construct schema declarations dynamically and lets you specify where you want to check for schema and where you do not want its funcionality.
Some advantages:
Very friendly DSL to describe data schema.
Various tools. For example: Data generation for generative testing, tools to explain why a schema does not match.
Some disadvantages:
Only checks for schema where and when you tell it to do so.
External library, not supported by the core team.
Spec
Spec is the latest player with a philosophy borrowed from Racket lang. It is (going to be) part of the Clojure core library from the Clojure version 1.9.
The basic idea is to have entity types specified by the (namespaced) keys in a map object. Spec declarations are stored in the application's registry bound to namespaced keywords. Spec is very strong in sequence validation.
Some advantages:
Part of the Clojure core, not an external library. It is now used for parsing macro arguments and also for documentation purposes.
The community is very excited about it resulting in interesting ideas such as using spec in genetic programming and generative testing.
Some disadvantages:
Will be available in Clojure 1.9 which is not yet a released stable version. It is still a new technology not widely used.
Spec do not look like the data they are describing.
Personally, core.typed feels intimidating and core.spec feels immature so I use schema in production. My advice is the following:
If you need static type checking then core.typed is the way to go.
If you want to do parsing then core.spec is a nice choice.
If you want simple type descriptions then schema will be a good fit.
Related
I am working on modeling GPB messages and inter-app communication using MagicDraw's sequence diagramming.
Ultimately, I want to use MagicDraw's C++ code generation tool to export the models into C++ code, and then turn that into .proto files. The goal is to turn the diagrammed models from MagicDraw and convert them into .proto files.
I've spent 5+ hours looking for ways to do this, but it looks like code conversion only works from writing .proto files and then using protoc to turn them into C++, not the other way around.
Is there any way to reverse this process?
Ah, the quest for the hallowed model<==>schema<==>code<==>model triangle.
I don't actually have a good answer I'm afraid. The closest I've got to this is Enterprise Architect, XSD, and ASN.1. But this is not complete; Enterprise Architect doesn't sync to XSD; it's a straight forward import/export, so changes made in XSD don't reflect back to Enterprise Architect. In EA, you can define classes (classic UML stuff), and export a package of classes as XSD, which the better ASN.1 tools will consume as a schema (there is an official translation between XSD and ASN.1 schema, and ASN.1 compilers tend to be better at implementing and enforcing things like minoccurs in generated code (a lot of XML/XSD compilers actually do a bad job of such things).
I've kind of given up looking. My suspicion is that tool developers will only do so if there is a market (or an enthusiastic user base willing to contribute donations for OSS). An awful lot of software gets developed without the use of schemas to define data structures (which leads to a large number of the world's buffer overruns, interface bugs, etc), never mind tools that sync code / schemas / models. So I think that there's precious few developers out there who see the actual value of such tooling, and so there's not that much enthusiasm for developing the tools in the first place. It's a classic chicken / egg situation.
Which is a pity. Using a strong schema system like ASN.1, or XSD (with a really good compiler) or perhaps at a pinch JSON (there seems to be more of an emphasis on using schema to check objects, rather than using scheme to define classes) leads to a very agile development path. Particularly with ASN.1, you can have all of a system's messages / data structures and their constraints and all of the system's constants defined once in a schema. You can then use the build system to propagate any changes to every corner of a project automatically. With this pattern embraced you can make late-breaking changes in system interfaces really quite easily and safely.
I'm to write an event correlator. A fundamental part of the system will be a huge decision tree that recognizes the origin of the fault basing on recorded states and log files, and one of the primary concerns was keeping that tree maintainable - written in a format easily understandable and editable for the programmer.
Since 7-levels-deep nested if()s is not my idea of "maintainable and easy to understand", I asked for ideas how to represent it in a form that is a good middle ground between machine-friendly, user-friendly and cost-efficient. The obvious answer was using a Domain-specific language that would be compilable to C++ in which the actual event correlator will be written. The obvious question was how that DSL should look like.
The suggestion I liked best was to use UML activity diagram, and have it compiled to C++. The diagram would likely consist almost strictly of decisions, with activities only at leaves of the tree, as conclusions reached by the decision process. In essence, the diagram is my graphical DSL, which should be then compiled into that huge bunch of if()s in C++. And while I'll still need to craft all conditional functions by hand, at least the interconnections between the conditions should get handled by the system.
Now, what tool should I use for creating that diagram?
Since "roll your own" isn't my idea of cost-efficient, considering it is to ultimately create one, single diagram for one device (even if it will likely be edited forever, as new modes of failure are discovered), I had a look at the List of Unified Modeling Language tools.
Quite a few of these, including these that have "C++" listed in the "Languages generated" but I know the reality is never that good - I'm not interested in a bunch of header files pre-filled with class definitions according to the class diagram. I need a file that contains my decision tree; a bunch of conditional statements with conditions pre-filled with decision functions calls which I'm to write by hand, and outcomes as specific conclusion function calls.
Now my question is which ones can do that, aren't overly difficult to use, and aren't expensive either - free tools preferred but reasonably priced commercial ones are fine too.
Alternatively, failing that - which ones can save that diagram in a form that I could parse with a self-made "compiler", and how to approach creating that compiler.
Of course other suggestions are most welcome too - maybe a tool for old-fashioned flow diagram that can generate such code? Maybe a dedicated DSL to create what I need exists already?
Enterprise Architect can generate C++ code from behavioral diagrams, including Activity Diagrams. It's offered in several editions; the lowest edition to support behavioral code generation costs $599. Here's the section of the user guide: Generate From Behavioral Models. Beyond code generation, EA offers simulation, traceability, and many other niceties.
If you can implement your logic in a Statechart instead, you can use the free QM Modeler. It generates C++ code. It's designed to work with the QP active object framework, but you can use QM without relying on QP. (You can also use Enterprise Architect to generate code from Statecharts.)
This URL states that UML is represented as "an XMI format" - a kind of XML based
standard for representing UML.
http://documentation.softwareag.com/webmethods/tamino/ins441/advconc/FromUMLtoXML.htm
If you were to use this standard, your data might be more compatible with the
other CASE tools:
Most UML tools provide a function to serialize a model into XMI format. XMI is an XML-based industry standard for the exchange of metadata between CASE tools. Because it is XML based, XMI can be converted with the help of XSLT stylesheets into other formats such as XML Schema. An example of such a stylesheet can be found at http://www.aomodeling.org/.
I am guessing this XML could be parsed with an ordinary C++ XML parser such as
Xerces or (for Windows) MSXML /XML DOM.
Current Choice: lua-jit. Impressive benchmarks, I am getting used to the syntax. Writing a high performance ABI will require careful consideration on how I will structure my C++.
Other Questions of interest
Gambit-C and Guile as embeddable languages
Lua Performance Tips (have the option of running with a disabled collector, and calling the collector at the end of a processing run(s) is always an option).
Background
I'm working on a realtime high volume (complex) event processing system. I have a DSL that represents the schema of the event structure at source, the storage format, certain domain specific constructs, firing internal events (to structure and drive general purpose processing), and encoding certain processing steps that always happen.
The DSL looks pretty similar to SQL, infact I am using berkeley db (via sqlite3 interface) for long-term storage of events. The important part here is that the processing of events is done set-based, like SQL. I have come to conclusion however that I should not add general-purpose processing logic to the DSL, and rather embed lua or lisp to take care of this.
The processing core is built arround boost::asio, it is multithreaded, rpc is done via protocol buffers, events are encoded using the protocol buffer IO library --i.e., the events are not structured using protocol buffer object they just use the same encoding/decoding library. I will create a dataset object that contains rows, pretty similar to how a database engine stores in memory sets. processing steps in the DSL will be taken care of first and then presented to the general purpose processing logic.
Regardless of what embeddable scripting environment I use, each thread in my processing core will probably needs it's own embedded-language-environment (that is how lua requires it to be at least if you are doing multi-threaded work).
The Question(s)
The choice at the moment is between lisp ECL and lua. Keeping in mind that performance and throughput is a strong requirement, this means minimising memory allocations is highly desired:
If you were in my position which language would you chose ?
are there any alternatives I should consider (don't suggest languages that don't have an embeddable implementation). Javascript v8 perhaps ?
Does lisp fit the domain better ? I don't think lua and lisp are that different in terms of what they provide. Call me out :D
Are there any other properties (like the ones below) I should be thinking about ?
I assert that any form of embedded database IO (see the example DSL below for context) dwarfs the scripting language call on orders of magnitude, and that picking either will not add much overhead to the overall throughput. Am I on the right track ? :D
Desired Properties
I would like to map my dataset onto a lisp list or lua table and I would like to minimise redundant data copies. For example adding a row from one dataset to another should try to use reference semantics if both tables have the same shape.
I can guarantee that the dataset that is passed as input will not change whilst I have made the lua/lisp call. I want lua and lisp to enforce not altering the dataset as well if possible.
After the embedded call end's the datasets should be destroyed, any references created would need to be replaced with copies (I guess).
DSL Example
I attach a DSL for your viewing pleasure so you can get an idea of what I am trying to achieve. Note: The DSL does not show general purpose processing.
// Derived Events : NewSession EndSession
NAMESPACE WebEvents
{
SYMBOLTABLE DomainName(TEXT) AS INT4;
SYMBOLTABLE STPageHitId(GUID) AS INT8;
SYMBOLTABLE UrlPair(TEXT hostname ,TEXT scriptname) AS INT4;
SYMBOLTABLE UserAgent(TEXT UserAgent) AS INT4;
EVENT 3:PageInput
{
//------------------------------------------------------------//
REQUIRED 1:PagehitId GUID
REQUIRED 2:Attribute TEXT;
REQUIRED 3:Value TEXT;
FABRRICATED 4:PagehitIdSymbol INT8;
//------------------------------------------------------------//
PagehitIdSymbol AS PROVIDED(INT8 ph_symbol)
OR Symbolise(PagehitId) USING STPagehitId;
}
// Derived Event : Pagehit
EVENT 2:PageHit
{
//------------------------------------------------------------//
REQUIRED 1:PageHitId GUID;
REQUIRED 2:SessionId GUID;
REQUIRED 3:DateHit DATETIME;
REQUIRED 4:Hostname TEXT;
REQUIRED 5:ScriptName TEXT;
REQUIRED 6:HttpRefererDomain TEXT;
REQUIRED 7:HttpRefererPath TEXT;
REQUIRED 8:HttpRefererQuery TEXT;
REQUIRED 9:RequestMethod TEXT; // or int4
REQUIRED 10:Https BOOL;
REQUIRED 11:Ipv4Client IPV4;
OPTIONAL 12:PageInput EVENT(PageInput)[];
FABRRICATED 13:PagehitIdSymbol INT8;
//------------------------------------------------------------//
PagehitIdSymbol AS PROVIDED(INT8 ph_symbol)
OR Symbolise(PagehitId) USING STPagehitId;
FIRE INTERNAL EVENT PageInput PROVIDE(PageHitIdSymbol);
}
EVENT 1:SessionGeneration
{
//------------------------------------------------------------//
REQUIRED 1:BinarySessionId GUID;
REQUIRED 2:Domain STRING;
REQUIRED 3:MachineId GUID;
REQUIRED 4:DateCreated DATETIME;
REQUIRED 5:Ipv4Client IPV4;
REQUIRED 6:UserAgent STRING;
REQUIRED 7:Pagehit EVENT(pagehit);
FABRICATED 8:DomainId INT4;
FABRICATED 9:PagehitId INT8;
//-------------------------------------------------------------//
DomainId AS SYMBOLISE(domain) USING DomainName;
PagehitId AS SYMBOLISE(pagehit:PagehitId) USING STPagehitId;
FIRE INTERNAL EVENT pagehit PROVIDE (PagehitId);
}
}
This project is a component of a Ph.D research project and is/will be free software. If your interested in working with me (or contributing) on this project, please leave a comment :D
I strongly agree with #jpjacobs's points. Lua is an excellent choice for embedding, unless there's something very specific about lisp that you need (for instance, if your data maps particularly well to cons-cells).
I've used lisp for many many years, BTW, and I quite like lisp syntax, but these days I'd generally pick Lua. While I like the lisp language, I've yet to find a lisp implementation that captures the wonderful balance of features/smallness/usability for embedded use the way Lua does.
Lua:
Is very small, both source and binary, an order of magnitude or more smaller than many more popular languages (Python etc). Because the Lua source code is so small and simple, it's perfectly reasonable to just include the entire Lua implementation in your source tree, if you want to avoid adding an external dependency.
Is very fast. The Lua interpreter is much faster than most scripting languages (again, an order of magnitude is not uncommon), and LuaJIT2 is a very good JIT compiler for some popular CPU architectures (x86, arm, mips, ppc). Using LuaJIT can often speed things up by another order of magnitude, and in many cases, the result approaches the speed of C. LuaJIT is also a "drop-in" replacement for standard Lua 5.1: no application or user code changes are required to use it.
Has LPEG. LPEG is a "Parsing Expression Grammar" library for Lua, which allows very easy, powerful, and fast parsing, suitable for both large and small tasks; it's a great replacement for yacc/lex/hairy-regexps. [I wrote a parser using LPEG and LuaJIT, which is much faster than the yacc/lex parser I was trying emulate, and was very easy and straight-forward to create.] LPEG is an add-on package for Lua, but is well-worth getting (it's one source file).
Has a great C-interface, which makes it a pleasure to call Lua from C, or call C from Lua. For interfacing large/complex C++ libraries, one can use SWIG, or any one of a number of interface generators (one can also just use Lua's simple C interface with C++ of course).
Has liberal licensing ("BSD-like"), which means Lua can be embedded in proprietary projects if you wish, and is GPL-compatible for FOSS projects.
Is very, very elegant. It's not lisp, in that it's not based around cons-cells, but it shows clear influences from languages like scheme, with a straight-forward and attractive syntax. Like scheme (at least in it's earlier incarnations), it tends towards "minimal" but does a good job of balancing that with usability. For somebody with a lisp background (like me!), a lot about Lua will seem familiar, and "make sense", despite the differences.
Is very flexible, and such features as metatables allow easily integrating domain-specific types and operations.
Has a simple, attractive, and approachable syntax. This might not be such an advantage over lisp for existing lisp users, but might be relevant if you intend to have end-users write scripts.
Is designed for embedding, and besides its small size and fast speed, has various features such as an incremental GC that make using a scripting language more viable in such contexts.
Has a long history, and responsible and professional developers, who have shown good judgment in how they've evolved the language over the last 2 decades.
Has a vibrant and friendly user-community.
You don't state what platform you are using, but if it would be capable of using LuaJIT 2 I'd certainly go for that, since execution speeds approach that of compiled code, and interfacing with C code just got a whole lot easier with the FFI library.
I don't quit know other embeddable scripting languages so I can't really compare what they can do, and how they work with tables.
Lua mostly works with references: all functions, userdata, tables are used by reference, and are collected on the next gc run when no references to the data are left.
Strings are internalised, so a certain string is in the memory only once.
The thing to take into account is that you should avoid creating and subsequently discarding loads of tables, since this can slow down the GC cycle (as explained in the Lua gem you cited)
For parsing your code sample, I'd take a look at the LPEG library
There is a number of options for implementing high performance embedded compilers. One is Mono VM, it naturally comes with dozens of already made high quality languages implemented on top of it, and it is quite embeddable (see how Second Life is using it). It is also possible to use LLVM - looks like your DSL is not complicated, so implementing an ad hoc compiler would not be a big deal.
I happened to work on a project which have some parts that is similar to your project, It's a cross-platform system running on Win-CE,Android,iOS, I need maximize cross-platform-able code, C/C++ combine with a embeddable language is a good choice. here is my solution related to your questions.
If you were in my position which language would you chose ?
The DSL in my project is similar to yours. for performance, I wrote a compiler with Yacc/Lex to compile the DSL to binary for runtime and a bunch of API to get information from binary, but it's annoying when there is something modified in DSL syntax, I need to modify both compiler and APIs, so I abondoned the DSL, turned into XML(don't write XML directly, a well defined schema is worthy), I wrote a general compiler converting XML to lua table, reimplement APIs with lua. by doing this I got two benefits: Readability and flexibility, without perceivable performance degradation.
Are there any alternatives I should consider (don't suggest languages that don't have an embeddable implementation). Javascript v8 perhaps ?
Before choosing lua, I considerd Embedded Ch(mostly used in industrial control system) , embedded lisp and lua, at last lua stand out, because lua is well integrated with C, lua have a prosperous community, and lua is easy to learn for another team member. regarding Javascript v8, it's like using a steam-hammer to crack nuts, if used in a embedded realtime system.
Does lisp fit the domain better ? I don't think lua and lisp are that different in terms of what they provide. Call me out :D
For my domain, lisp and lua have the same ability in semantic, they both can handle XML-based DSL easily, or you might even wrote a simple compiler converting XML to lisp list or lua table. they both can handle domain logic easily. but lua is better integrated with C/C++, this is what lua aim for.
Are there any other properties (like the ones below) I should be thinking about ?
Working alone or with team members is also a weighting factor of solution selection. nowadays not so many programmers are familiar with lisp-like language.
I assert that any form of embedded database IO (see the example DSL below for context) dwarfs the scripting language call on orders of magnitude, and that picking either will not add much overhead to the overall throughput. Am I on the right track ? :D
here is a list of programming languages performance, here is a list of access time of computer components. if your system is IO-bound, the overhead of script is not key point. my system is a O&M(Operation & Maintenance) system, script performance is insignificant.
I want to ask what sort of type safety languages constructs are there on Clojure?
I've read 'Practical Clojure' from Luke VanderHart and Stuart Sierra several times now, but i still have the distinct impression that Clojure (like other lisps) don't take compilation-time validation checking very seriously. Type safety is just but one (very popular) strategy for doing compilation-time checking of correct semantics
I'm asking this question because i'm aching to be proven wrong; what sort of design patterns are there available on clojure to validate (at compilation-time, not at run-time) that a function that expects a string doesn't get called with, say, a list of integers?
Also, i've read very smart people like Paul Graham openly advocate about lisp allowing to implement everything from lower-level languages on top of it (most would say that the language themselves are being reimplemented on top of it), so if that assertion would be true, then trivially stuff like type checking should be a piece of cake. So do you feel that there exist type systems (or the ability to implement such type systems) in clojure or other lisps, that give the programmer the ability to offset validation checking from run-time to compile-time, or even better, design-time?
Compilation units in Clojure are very small - a single function. Lispers tend to change small portions of running programs while they develop. Introducing static type checking into this style of development is problematic - for a deeper discussion why I recommend the post Types are Anti-Modular by Gilad Bracha. Thus Clojure's prefers pre/post-conditions which jive better with Lisp's highly REPL-oriented development.
That said, it's certainly desirable and possible to build an a la carte type system for Clojure. This trail has been blazed by Qi/Shen, and Typed Racket. This functionality could be easily provided as a library. I'm hoping to build something like that in the future with core.logic - https://github.com/clojure/core.logic.
Since Clojure is a dynamic language the whole idea is not to check the types (or much of anything) at compile time.
Even when you add type hints to your function they do not get checked at compile-time.
Since Clojure is a Lisp you can do whatever you want at compile-time with macros and macros are powerful enough that you can write your own type systems. Some people have made type systems for lisps Typed Racket and Qi. These Type systems can be just as powerful as any Type system in a "normal" language.
Ok, we now know that it is possible but does Clojure has such a optional type system? The answer is currently no but there is a logic engine (core.logic) that could be used to implement a typesystem but the author has not worked (yet) in that direction.
There is a library that adds an optional type system to Clojure,
http://typedclojure.org/
Rationale
Static typing has well known benefits. For example, statically typed languages catch many common programming errors at the earliest time possible: compile time. Types also serve as an excellent form of (machine checkable) documentation that almost always augment existing hand-written documentation.
Languages without static type checking (dynamically typed) bring other benefits. Without the strict rigidity of mandatory static typing, they can provide more flexible and forgiving idioms that can help in rapid prototyping. Often the benefits of static type checking are desired as the program grows.
This work adds static type checking (and some of its benefits) to Clojure, a dynamically typed language, while still preserving idioms that characterise the language. It allows static and dynamically typed code to be mixed so the programmer can use whichever is more appropriate.
Is there a tool which can handle model checking large, real-world, mostly-C++, distributed systems, such as KDE?
(KDE is a distributed system in the sense that it uses IPC, although typically all of the processes are on the same machine. Yes, by the way, this is a valid usage of "distributed system" - check Wikipedia.)
The tool would need to be able to deal with intraprocess events and inter-process messages.
(Let's assume that if the tool supports C++, but doesn't support other stuff that KDE uses such as moc, we can hack something together to workaround that.)
I will happily accept less general (e.g. static analysers specialised for finding specific classes of bugs) or more general static analysis alternatives, in lieu of actual model checkers. But I am only interested in tools that can actually handle projects of the size and complexity of KDE.
You're obviously looking for a static analysis tool that can
parse C++ on scale
locate code fragments of interest
extract a model
pass that model to a model checker
report that result to you
A significant problem is that everybody has a different idea about what model they'd like to check.
That alone likely kills your chance of finding exactly what you want, because each model extraction tool
has generally made a choice as to what it wants to capture as a model, and the chances that it matches
what you want precisely are IMHO close to zero.
You aren't clear on what specifically you want to model, but I presume you want to find the communication
primitives and model the process interactions to check for something like deadlock?
The commercial static analysis tool vendors seem like a logical place to look, but I don't think they are there, yet. Coverity would seem like the best bet, but it appears they only have some kind of dynamic analysis for Java threading issues.
This paper claims to do this, but I have not looked at in any detail: Compositional analysis of C/C++ programs
with VeriSoft. Related is [PDF] Computer-Assisted Assume/Guarantee Reasoning with VeriSoft. It appears you have to hand-annotate
the source code to indicate the modelling elements of interest. The Verifysoft tool itself appears to be proprietary to Bell Labs and is likely hard to obtain.
Similarly this one: Distributed Verification of Multi-threaded C++ Programs .
This paper also makes interesting claims, but doesn't process C++ in spite of the title:
Runtime Model Checking of Multithreaded C/C++ Programs.
While all the parts of this are difficult, an issue they all share is parsing C++ (as exemplified by
the previously quoted paper) and finding the code patterns that provide the raw information for the model.
You also need to parse the specific dialect of C++ you are using; its not nice that the C++ compilers all accept different languages. And, as you have observed, processing large C++ codes is necessary. Model checkers (SPIN and friends) are relatively easy to find.
Our DMS Software Reengineering Toolkit provides for general purpose parsing, with customizable pattern matching and fact extraction, and has a robust C++ Front End that handles many dialects of C++ (EDIT Feb 2019: including C++17 in Ansi, GCC and MS flavors). It could likely be configured to find and extract the facts that correspond to the model you care about. But it doesn't do this this off the shelf.
DMS with its C front end have been used to process extremely large C applications (19,000 compilation units!). The C++ front end has been used in anger on a variety of large-scale C++ projects (EDIT Feb 2019: including large scale refactoring of APIs across 3000+ compilation units). Given DMS's general capability, I think it likely capable of handling fairly big chunks of code. YMMV.
Static code analyzers when used against large code base first time usually produce so many warnings and alerts that you won't be able to analyze all of them in reasonable amount of time. It is hard to single out real problems from code that just look suspicious to a tool.
You can try to use automatic invariant discovery tools like "Daikon" that capture perceived invariants at run time. You can validate later if discovered invariants (equivalence of variables "a == b+1" for example) make sense and then insert permanent asserts into your code. This way when invariant is violated as result of your change you will get a warning that perhaps you broke something by your change. This method helps to avoid restructuring or changing your code to add tests and mocks.
The usual way of applying formal techniques to large systems is to modularise them and write specifications for the interfaces of each module. Then you can verify each module independently (while verifying a module, you import the specifications - but not the code - of the other modules it calls). This approach makes verification scalable.