Validate basic set operations in JML - jml

How do I verify basic set operations such as intersect, union and difference in JML tool like OpenJML in which keywords like "\intersect \set_minus \set_union" are not supported?
The Java interfaces on which I want to do validation on looks like this:
MySetInterface<S> intersect (MySetInterface<S> set)
MySetInterface<S> union (MySetInterface<S> set)
MySetInterface<S> difference (MySetInterface<S> set)

You would have to find a functional description for these operations, as for all methods that don't have a direct correspondence in the JML-supported theories. The mathematical definition of the union of two sets, for example, is "the set of all objects that are members of either the one or the other set". I.e., the method union could be specified roughly as
/*# public normal_behavior
# ensures this.containsAll(\old(this)) &&
# this.containsAll(\old(set)); */
and similarly for the other methods, assuming that there is a pure containsAll method available. Otherwise, you could use quantified expressions and a pure contains method. If you don't have such pure query methods, you would have to expose implementation details like an underlying field, which would make less sense for specifying an interface.
Does this clarify the problem for you?

Related

How to not conflate a spec'ed map's key set and value set?

The Clojure official spec doc states:
Most systems for specifying structures conflate the specification of
the key set (e.g. of keys in a map, fields in an object) with the
specification of the values designated by those keys. I.e. in such
approaches the schema for a map might say :a-key’s type is x-type and
:b-key’s type is y-type. This is a major source of rigidity and
redundancy.
And in this SO question: clojure.spec human readable shape?
The following example is given:
(s/def ::car (s/keys :req [::tires ::chassis]))
(s/def ::tires (s/coll-of ::tire :count 4))
(s/def ::tire (s/or :goodyear ::goodyear}
:michelin ::michelin))
My question is: how is this not ridig and not redundant? By opposition to that, what would be and example (in Java?) of something rigid and redundant?
As I see it you still cannot define a car that'd be, say, a dragster with 6 wheels because ::tires must have 4 elements. You cannot either define a floating car whose rear wheels would be propellers.
How's the above example different from, say, static typing from the standpoint of rigidity and redundancy? How's that different from a Java car class that'd be constructed using a tires instance itself containing four tire instance?
Basically I think what I don't get is that you're spec'ing the map by telling which keys are required. So far so good. But then the keys are themselved spec'ed, so the values designated by those keys are specified too no!? How are things "not conflated" here?
Just for a moment, zoom back out and think about software development as a practice. We'll get to spec at the bottom.
As engineers, one of our most important skills is the ability to define and think in abstractions.
Think about how function composition simplifies the construction of complex software. It simplifies the definition of a complex function by allowing us to think more broadly about what is happening. A nice benefit of this is that it also allows for the smaller functions which compose the larger one to be reused when writing similar, but slightly different, complex functions.
Of course, you don't need functions at all. You can write an entire project in one function. However, this is a bad idea primarily because it conflates the intent of a function with the specification of the function.
For example, a function make-car calls build-engine, build-drivetrain, install-interior, and a lot more. Sure, you can take the code from each of those and paste them inside make-car. But the result is that you have lost abstraction. make-car cannot be improved or changed except in the make-car code itself. The code for how to build an engine cannot be improved or reused to make any other car. Why? Because the knowledge of how to build that engine for that particular specification of car is embedded in the make-car function.
So, make-car should not define how to build the engine (or any other components of the car); it simply specifies what components make up the car, and how they work together. The specifics of those components do not belong to the working knowledge embedded in make-car.
The comparison to spec should now be clear:
In a similar fashion, spec allows you to define entities as abstractions. Could you embed the knowledge of an entity within a specification/schema? Certainly. Can you directly substitute the specification of the individual components into the entity definition itself? Yes. But in doing so, you are conflating the entity with the definition of its components. The loss is the same as with the above: you lose the abstraction of what the entity is, as now you must change the definition of the entity in order to change details about the entity that are really details of its components; and, you have lost the ability to reuse definitions for similar, but distinct, entities.

visitor pattern adding new functionality

I've read thes question about visitor patterns https://softwareengineering.stackexchange.com/questions/132403/should-i-use-friend-classes-in-c-to-allow-access-to-hidden-members. In one of the answers I've read
Visitor give you the ability to add functionality to a class without actually touching the class itself.
But in visited object we have to add either new interface, so we actualy "touch" the class (or at least in some cases to put setters and getters, also changing the class).
How exactly I will add functionality with visitor without changing visiting class?
The visitor pattern indeed assumes that each class interface is general enough, so that, if you would know the actual type of the object, you would be able to perform the operation from outside the class. If this is not the starting point, visitor indeed might not apply.
(Note that this assumption is relatively weak - e.g., if each data member has a getter, then it is trivially achieved for any const operation.)
The focus of this pattern is different. If
this is the starting point
you need to support an increasing number of operations
then what changes to the classs' code do you need to do in order to dispatch new operations applied to pointers (or references) to the base class.
To make this more concrete, take the classic visitor CAD example:
Consider the design of a 2D CAD system. At its core there are several types to represent basic geometric shapes like circles, lines and arcs. The entities are ordered into layers, and at the top of the type hierarchy is the drawing, which is simply a list of layers, plus some additional properties.
A fundamental operation on this type hierarchy is saving the drawing to the system's native file format. At first glance it may seem acceptable to add local save methods to all types in the hierarchy. But then we also want to be able to save drawings to other file formats, and adding more and more methods for saving into lots of different file formats soon clutters the relatively pure geometric data structure we started out with.
The starting point of the visitor pattern is that, say, a circle, has sufficient getters for its specifics, e.g., its radius. If that's not the case, then, indeed, there's a problem (in fact, it's probably a badly designed CAD code base anyway).
Starting from this point, though, when considering new operations, e.g., writing to file type A, there are two approaches:
implement a virtual method like write_to_file_type_a for each class and each operation
implement a virtual method accept_visitor for each class only, only once
The "without actually touching the class itself" in your question means, in point 2 just above, that this is all that's now needed to dispatch future visitors to the correct classes. It doesn't mean that the visitor will start writing getters, for example.
Once a visitor interface has been written for one purpose, you can visit the class in different ways. The different visiting does not require touching the class again, assuming you are visiting the same compontnts.

How to log user defined POD struct in C++

I need to add logging to a legacy c++ project, which contains hundreds of user defined structs/classes. These structs only contain primary types as int, float, char[], enum.
Content of objects need to be logged ,preferred in human readable way , but not a must, as long as the object could be reconstructed.
Instead of writing different serialization methods for each class, is there any alternative method?
What you want is a Program Transformation System (PTS). These are tools that can read source code, build compiler data structures (usually ASTs) that represent the source code, and allow you to modify the ASTs and regenerate source code from the modified AST.
These are useful because they "step outside" the language, and thus have no language-imposed limitations on what you can analyze or transform. So it doesn't matter if your langauge doesn't have reflection for everything; a good PTS will give you full access to every detail of the language, including such arcana as comments and radix on numeric literals.
Some PTSes are specific to a targeted language (e.g, "Jackpot" is only usuable for Java). A really good PTS is provided a description of an arbitrary programming langauge, and can then manipulate that language. That description has to enable the PTS to parse the code, analyze it (build symbol tables at least) and prettyprint the parsed/modified result.
Good PTSes will allow you write the modifications you want to make using source-to-source transformations. These are rules specifying changes written in roughly the following form:
if you see *this*, replace it by *that* when *condition*
where this and that are patterns using the syntax of the target language being processed, and condition is a predicate (test) that must be true to enable the rule to be applied. The patterns represent well-formed code fragmens, and typically allow metavariables to represent placeholders for arbitrary subfragments.
You can use PTSes for a huge variety of program manipulation tasks. For OP's case, what he wants is to enumerate all the structs in the program, pick out the subset of interest, and then generate a serializer for each selected struct as a modification to the original program.
To be practical for this particular task, the PTS must be able to parse and name resolve (build symbol tables) C++. There are very few tools that can do this: Clang, our DMS Software Reengineering Toolkit, and the Rose compiler.
A solution using DMS looks something like this:
domain Cpp~GCC5; -- specify the language and specific dialect to process
pattern log_members( m: member_declarations ): statements = TAG;
-- declares a marker we can place on a subtree of struct member declarations
rule serialize_typedef_struct(s: statement, m: member_declarations, i: identifier):
statements->statements
= "typedef struct { \m } \i;" ->
"typedef struct { \m } \i;
void \make_derived_name\(serialize,\i) ( *\i argument, s: stream )
{ s << "logging" << \toString\(\i\);
\log_members\(\m\)
}"
if selected(i); -- make sure we want to serialize this one
rule generate_member_log_list(m: member_declarations, t: type_specification, n: identifier): statements -> statements
" \log_members\(\t \n; \m\)" -> " s << \n; \log_members\(\m\) ";
rule generate_member_log_base(t: type_specification, n: identifier): statements -> statements
" \log_members\(\t \n; \)" -> " s << \n; ";
ruleset generate_logging {
serialize_typedef struct,
generate_member_log_list,
generate_member_log_base
}
The domain declaration tells DMS which specific language front-end to use. Yes, GCC5 as a dialect is different than VisualStudio2013, and DMS can handle either.
The pattern log_members is used as a kind of transformational pointer, to remember that there is some work to do. It wraps a sequence of struct member_declarations as an agenda (tag). What the rules do is first mark structs of interest with log_members to establish the need to generate the logging code, and then generate the member logging actions. The log_members pattern acts as a list; it is processed one element at a time until a final element is processed, and then the log_members tag vanishes, having served its purpose.
The rule serialize_typedef_struct is essentially used to scan the code looking for suitable structs to serialize. When it finds a typedef for a struct, it checks that struct is one that OP wants serialized (otherwise one can just leave off the if conditional). The meta-function selected is custom-coded (not shown here) to recognize the names of structs of interest. When a suitable typedef statement is found, it is replaced by the typedef (thus preserving it), and by the shell of a serializing routine containing the agenda item log_members holding the entire list of members of the struct. (If the code declares structs in some other way, e.g., as a class, you will need additional rules to recognize the syntax of those cases). Processing the agenda item by rewriting it repeatedly produces the log actions for the individual members.
The rules are written in DMS rule-syntax; the C++ patterns are written inside metaquotes " ... " to enable DMS to distinguish rule syntax from C++ syntax. Placeholder variables v are declared in the rule header according thier syntactic categories, and show up in the meta-quoted patterns using an escape notation \v. [Note the unescaped i in the selected function call: it isn't inside metaquotes]. Similarly, meta-functions and patterns references inside the metaquotes are similarly escaped, thus initially odd looking \log\( ... \) including the escaped pattern name, and escaped meta-parentheses.
The two rules generate_member_log_xxx hand the general and final cases of log generation. The general case handles one member with more members to do; the final case handles the last member. (A slight variant would be to process an empty members list by rewriting to the trivial null statement ;). This is essentially walking down a list until you fall off the end. We "cheat" and write rather simple logging code, counting on overloading of stream writes to handle the different datatypes that OP claims he has. If he has more complex types requiring special treatment (e.g., pointer to...) he may want to write specialized rules that recognize those cases and produce different code.
The ruleset generate_logging packages these rules up into a neat bundle. You can trivially ask DMS to run this ruleset over entire files, applying rules until no rules can be further applied. The serialize_typdef_structure rule finds the structs of interest, generating the serializing function shell and the log_members agenda item, which are repeatedly re-written to produce the serialization of the members.
This is the basic idea. I haven't tested this code, and there is usually some surprising syntax variations you end up having to handle which means writing a few more rules along the same line.
But once implemented, you can run this rule over the code to get serialized results. (One might implement selected to reject named structs that already have a serialization routine, or alternatively, add rules that replace any existing serialization code with newly generated code, ensuring that the serialization procedures always match the struct definition). There's the obvious extension to generating a serialized struct reader.
You can arguably implement these same ideas with Clang and/or the Rose Compiler. However, those systems do not offer you source-to-source rewrite rules, so you have to write procedural code to climb up and down trees, inspect individual nodes, etc. It is IMHO a lot more work and a lot less readable.
And when you run into your next "C++ doesn't reflect that", you can tackle the problem with the same tool :-}
Since C++ does not have reflection there is no way for you to dynamically inspect the members of an object at runtime. Thus it follows that you need to write a specific serialization/streaming/logging function for each type.
If all the different types had members of the same name, then you could write a template function to handle them, but I assume that is not the case.
As C++ does not have reflection this is not that easy.
If you want to avoid a verbose solution you can use a variadic template.
E.g.
`class MyStruct {
private:
int a;
float f;
public:
void log()
{
log_fields(a, f);
}
};`
where log_fields() is the variadic template. It would need to be specialized for all the basic types found on those user defined types and also for a recursive case.

C++ adaptor with different interfaces, where interfaces may have different type/number of input parameters

It is well know how to build the adapter when the adaptee's methods look same except for the name.
For example,
http://sourcemaking.com/design_patterns/adapter/cpp/2
where none of "doThis", "doThat", and "doOther" has inputs. However, what if different methods have different number of input parameters?
Thanks
The example given in linked document describes a use of the adapter pattern in a situation where the change is purely syntactic. The situation implied by your question contains a semantic change, ie the adaptee method does not provide the exact same service than what the adapter interface "promises" to deliver formally. This means that the adaptee's must be somehow wrapped with more than a simple name change: some work must be done around it to build the missing parameters or transform the existing parameters into those required by the adaptee.
If each new adaptee has different requirements, then each adapter must contain the ad-hoc adapting code. There's not much one can do to factor out a common pattern out of this situation. The only easy case is the trivial one, when all the needed parameters are independent of the passed ones, and can be computed once for all before constructing the adapter, hence allowing an adapter as a simple std::bind equivalent.

How to build objects from similar template classes

My goal is as follows.
I am working with proteins in a data analysis setting. The data available for any given protein is variable. I want to be able to build a protein class from more simple parent classes. Each parent class will be specific to a data layer that I have available to me.
Different projects may have different layers of data available. I would like to write simple classes for the protein that contain all of the variables and methods related to a specific data layer. And then, for any given project, be able to compile a project specific protein class which inherits from the relevant data layer specific protein classes.
In addition, each data layer specific protein class requires a similarly data layer specific chain class, residue class and atom class. They are all building blocks. The atoms are used to build the residues which are used to build the chains which are used to build the protein. The protein class needs to have access to all of its atoms, residue and chains. Similarly the chains need access to the residue and atoms.
I have used vectors and maps to store pointers to the relevant objects. There are also the relevant get and set methods. In order to give EVERY version of the protein variables and getter and setter methods I have made 1 template class for the atom, residue, chain and protein. This template class contains the vectors and getter and setter methods which give the protein access to its chains, residues and atoms. This template class is then inherited by every data layer specific protein class.
Is this the best approach?
First up, using inheritance is a nice way of abstraction and should help you build custom classes easily paving way for re-usability and maintenance.However spare a moment to consider your data structures.Using a vector seems like the most natural way to employ dynamic data, however, re-sizing vectors comes with some overheads, and sometimes when dealing with large data, this becomes an issue.
To overcome this, try to come up with an average number of data that each would have normally have.So you can have an array and a vector, and you can use the vector only when you are done with the array.This way you don't run into overheads too often.
Depending on the actual processing that you are about to do, you might want to re-think your data structures.If for example your data is sufficiently small and manageable, you can just use vectors and concentrate more on the actual computation.If however large data sets are to be handled, you might want to modify your data structures a little to make the processing easier.Good Luck.
You might want to look at the Composite Design Pattern to organize your multi-level data and to the Visitor Design Pattern to write algorithms that "visit" your data structure.
The Composite Design Pattern creates a Component interface (abstract base class), that allows for iteration over all the elements in its sub-layer, adding/removing elements etc. It should also have an accept(some_function) method to allow outside algorithms be applied to itself. Each specific layer (atom, residue, chain) would then be a concrete class that derives from the Component interface. Don't let a layer derive from its sub-layer: inheritance should only reflect an "is-a" relationship, except in very special circumstances.
The Visitor Design Pattern creates a hierarchy of algorithms, that is independent of the precise structure of your data. This pattern works best if the class hierarchy of your data does not change in the foreseeable future. [NOTE: you can still have whatever molecule you want by filling the structure with your particular data, just don't change the number of layers in your structure].
No matter what you do, it's always recommended to only use inheritance for re-using or extending interface, and to use composition for re-using / extending data. E.g. the STL containers such as vector and map don't have virtual destructors and were not designed to be used as base classes.