EMF model-to-model - eclipse-emf

There are 2 EMF ecore models. Named lang.ecore and export.ecore.
They are in general the same like. But export.ecore is in some aspects a little stripped down and some elements have a litte bit different semantics.
Now I have a loaded model of lang in memory and I want to create an export object tree, just in memory.
mapping equal-like elements
recreate references in the export model
customize different objects
Is there a way that I can efficiently do that while avoiding too much repetitive coding?

You could probably start your transformation by a step based on the reflexive APIs to "recreate" your source model into a destination model that is "mostly similar" just based on the names of your types and their features.
However if your metamodels are not that big I am not sure you would gain a lot of time, for a price in clarity/debugging.
Moreover, you want to consider whether your two metamodels will remain "generally alike" or if there is a chance they will evolve differently, or if your mapping will require you to divert from that path.

Related

What's the point to have a SceneGraph beyond a SceneTree

In this presentation SceneGraph and SceneTree are presented.
ST (SceneTree) is essentially an unfolded synchronized copy of SG (SceneGraph).
Since you are going to need ST anyway because:
it offers instance, where you need to save attributes
you need to traverse ST anyway for updating transformation and other cascading variables, such as visibility
even if SG offers less redundancy, what is the point to have SG in the first place?
To quote Markus Tavenrath, the guy who gave that presentation, who was asked the same question: http://pastebin.com/SRpnbBEj (as a result of a discussion on IRC (starts ~9:30))
[my emphasis]
One argument for a SceneGraph in general is that people have their
custom SceneGraphs implementation already and need to keep them for
legacy reasons and this SceneGraph is the reason why the CPU is the
bottleneck during the rendering phase. The goal of my talks is to show
ways to enhance a legacy SceneGraph in a way that the GPU can get the
bottleneck. One very interesting experiment would be adding the
techniques to OpenSceneGraph…
Besides this a SceneGraph is way more powerful with regards to data
manipulation, data access and reuse. I.e. the diamond shape can
be really powerful if you want to instantiate complex objects with
multiple nodes. Another powerful feature in our SceneGraph is the
properties feature which will be used for animation in the future
(search for LinkManager). With the LinkManager you can create a
network of connected properties and let data flow along the links in a
defined order. What’s missing for the animations are Curve objects
which can be used as input for animations. All those features have a
certain cost on the per object level with regards to memory size and
more memory means more touched cachelines during traversal which means
more CPU clocks spend on traversal.
The SceneTree is designed to keep only the hierarchy and the few
attributes which have to be propagated along the hierarchy. This way
the nodes are very lightweight and thus traversal is, if required,
very efficient. This also means that a lot of features provided in the
SceneGraph are no longer available in the SceneTree. Though with the
knowledge gained by the SceneTree it should be possible to write a
SceneGraph which provides most of the performance features of the
SceneTree.
Depending on the type of application you’re writing you might even
decide that the SceneTree is too big and you could write your
application directly on an interface like RiX which doesn’t have the
hierarchy anymore and thus eliminating yet another layer ;-).
So first reason is that people are already used to using a scenegraph and many custom implementations exist.
Other than that the scenetree is really a pruned scenegraph where only the necessary components which means that a scenegraph is more useful to the artists and modelers as it better abstracts the scene. A complex property may for example be a RNG seed that affect the look of each person in a crowd.
In a scenegraph that root Person node would just have a single property seed. Its children would select the exact mesh and color based on that seed. But in the expanded tree the unused meshes for alternate hairstyles, clothing and such will already have been pruned.

JavaFX and Clojure: binding observables to immutable objects

I've been trying to figure out the approach to take to allow a JavaFX TableView (or any other JavaFX thing) to represent some Clojure data, and allow the user to manipulate the data through the GUI.
For this discussion, let's assume I have a list/vector of maps, ie something like
[{:col1 "happy1" :col2 "sad1"} {:col1 "happy2" :col2 "sad2"}] and I want it to display in a graphical table as follows:
mykey1 mykey2
------------------
happy1 sad1
happy2 sad2
Pretty straightforward. This has been done a gazillion times in the history of the world.
The problem is the TableView insists on taking an ObservableList, etc., which is inherently a mutable thing, as are all the Observables in JavaFX. This is great for keeping the table up to date, and in a mutable model, it's also great for allowing the user to directly manipulate the data via the GUI. I'm not an expert, but in this case it seems JavaFX wants the GUI object to actually contain the real data. This seems funny (not ha ha) to me. To maintain my own model and communicate between GUI and model via some API or interface also implies that I'm maintaining the data in two places: in my own model, and in the GUI. Is this the Correct Way of Doing Things? Maybe this is ok since the GUI only ever displays a small fraction of the total data, and it lets my model data just be normal model data, not an instance of some Java-derived type.
So this leads to the following three general questions when trying to put a GUI on a stateless/immutable model:
How can the model underneath be truly immutable if the GUI necessarily allows you to change things? I'm thinking specifically of some sort of design tool, editor, etc., where the user is explicitly changing things around. For example LightTable is an editor, yet the story is it is based on immutable data. How can this be? I'm not interested in FRP for this discussion.
Assuming at some level there is at least one Atom or other Clojure mutable type (ref/var/agent/etc) (whether it be a single Atom containing the entire in-memory design database, or whether the design database is an immutable list of mutable Atoms), which of the [MVP, MCP, MVVM, etc.] models is best for this type of creation?
JavaFX has littered the class hierarchy with every imaginable variation of Observable interface (http://docs.oracle.com/javafx/2/api/javafx/beans/Observable.html), with such gems as Observable[whatever]Value, including for example ObservableMap and ObservableMapValue, and then dozens upon dozens of implementing classes such as IntegerProperty and SimpleIntegerProperty... geez! wtf?. Assuming that I have to create some Clojure objects (defrecord, etc.) and implement some of the Observable interface methods on my mostly immutable objects, can I just stick with Observable, or must I implement each one down to the leaf node, ie ObservableIntegerValue, etc.?
What is the correct high level approach? Maintain a single top-level atom which is replaced every time the user changes a value? Maintain a thousand low-level atoms? Let my values live as JavaFX Observables, and forget about Clojure data structs? Implement my own set of Observables in Clojure using some reify/proxy/gen-class, but implement them as immutables that get replaced each time a change is made? Is there a need or place for Clojure's add-watch function? I'd very much prefer that my data just be normal data in Clojure, not a "type" or an implementation of an interface of anything. An Integer should be an Integer, etc.
thanks
I would add a watch to the atom, and then update your observable based on the diff.
http://clojuredocs.org/clojure.core/add-watch

An alternative to hierarchical data model

Problem domain
I'm working on a rather big application, which uses a hierarchical data model. It takes images, extracts images' features and creates analysis objects on top of these. So the basic model is like Object-(1:N)-Image_features-(1:1)-Image. But the same set of images may be used to create multiple analysis objects (with different options).
Then an object and image can have a lot of other connected objects, like the analysis object can be refined with additional data or complex conclusions (solutions) can be based on the analysis object and other data.
Current solution
This is a sketch of the solution. Stacks represent sets of objects, arrows represent pointers (i.e. image features link to their images, but not vice versa). Some parts: images, image features, additional data, may be included in multiple analysis objects (because user wants to make analysis on different sets of object, combined differently).
Images, features, additional data and analysis objects are stored in global storage (god-object). Solutions are stored inside analysis objects by means of composition (and contain solution features in turn).
All the entities (images, image features, analysis objects, solutions, additional data) are instances of corresponding classes (like IImage, ...). Almost all the parts are optional (i.e., we may want to discard images after we have a solution).
Current solution drawbacks
Navigating this structure is painful, when you need connections like the dotted one in the sketch. If you have to display an image with a couple of solutions features on top, you first have to iterate through analysis objects to find which of them are based on this image, and then iterate through the solutions to display them.
If to solve 1. you choose to explicitly store dotted links (i.e. image class will have pointers to solution features, which are related to it), you'll put very much effort maintaining consistency of these pointers and constantly updating the links when something changes.
My idea
I'd like to build a more extensible (2) and flexible (1) data model. The first idea was to use a relational model, separating objects and their relations. And why not use RDBMS here - sqlite seems an appropriate engine to me. So complex relations will be accessible by simple (left)JOIN's on the database: pseudocode "images JOIN images_to_image_features JOIN image_features JOIN image_features_to_objects JOIN objects JOIN solutions JOIN solution_features") and then fetching actual C++ objects for solution features from global storage by ID.
The question
So my primary question is
Is using RDBMS an appropriate solution for problems I described, or it's not worth it and there are better ways to organize information in my app?
If RDBMS is ok, I'd appreciate any advice on using RDBMS and relational approach to store C++ objects' relationships.
You may want to look at Semantic Web technologies, such as RDF, RDFS and OWL that provide an alternative, extensible way of modeling the world. There are some open-source triple stores available, and some of the mainstream RDBMS also have triple store capabilities.
In particular take a look at Manchester Universities Protege/OWL tutorial: http://owl.cs.manchester.ac.uk/tutorials/protegeowltutorial/
And if you decide this direction is worth looking at further, I can recommend "SEMANTIC WEB for the WORKING ONTOLOGIST"
Just based on the diagram, I would suggest that an RDBMS solution would indeed work. It has been years since I was a developer on an RDMS (called RDM, of course!), but I was able to renew my knowledge and gain very many valuable insights into data structure and layout very similar to what you describe by reading the fabulous book "The Art of SQL" by Stephane Faroult. His book will go a long way to answer your questions.
I've included a link to it on Amazon, to ensure accuracy: http://www.amazon.com/The-Art-SQL-Stephane-Faroult/dp/0596008945
You will not go wrong by reading it, even if in the end it does not solve your problem fully, because the author does such a great job of breaking down a relation in clear terms and presenting elegant solutions. The book is not a manual for SQL, but an in-depth analysis of how to think about data and how it interrelates. Check it out!
Using an RDBMS to track the links between data can be an efficient way to store and think about the analysis you are seeking, and the links are "soft" -- that is, they go away when the hard objects they link are deleted. This ensures data integrity; and Mssr Fauroult can answer what to do to ensure that remains true.
I don't recommend RDBMS based on your requirement for an extensible and flexible model.
Whenever you change your data model, you will have to change DB schema and that can involve more work than change in code.
Any problems with DB queries are discovered only at runtime. This can make a lot of difference to the cost of maintenance.
I strongly recommend using standard C++ OO programming with STL.
You can make use of encapsulation to ensure any data change is done properly, with updates to related objects and indexes.
You can use STL to build highly efficient indexes on the data
You can create facades to get you the information easily, rather than having to go to multiple objects/collections. This will be one-time work
You can make unit test cases to ensure correctness (much less complicated compared to unit testing with databases)
You can make use of polymorphism to build different kinds of objects, different types of analysis etc
All very basic points, but I reckon your effort would be best utilized if you improve the current solution rather than by look for a DB based solution.
http://www.boost.org/doc/libs/1_51_0/libs/multi_index/doc/index.html
"you'll put very much effort maintaining consistency of these pointers
and constantly updating the links when something changes."
With the help of Boost.MultiIndex you can create almost every kind of index on a "table". I think the quoted problem is not so serious, so the original solution is manageable.

How to organize import and export code for versioned files?

If an application has to be able to open (and possibly save) the file format for the last N releases, how should the code be organized sanely so that updating the file format is easy and less error-prone? Assume the file format is in XML, and functions take in objects for export and produce objects for import.
Append a number to the end of each function name, and copy/paste it and increment the number for each new version? That's like maintaining multiple versions of version-controlled functions within source code. Perhaps do some magic at build time?
Firstly, supporting import of old versions is a lot easier than export. This is because usually later versions are different because they support more features. Hence saving to an old format may well mean loss of data. Consequently, my experience has only been on supporting import of multiple versions, spanning over a decade.
XML is of course the smart solution. It is designed with this problem in mind. The key point to me is that clean code structure follows from a clean data model. Provided new versions add features and these are represented by support for additional tags, you do not really have to recode handling of existing tags at all.
Now you could change the semantics of existing tags, requiring their recoding. Solution: don't do this if you can avoid it. When you add a attribute or tag, make sure you define the default value and then old and new data files are handled seamlessly.
So it seems to me that with care you should be able to avoid many cases where you really have significantly different code for handling the same fields in different file versions. Where this does occur, I am guess there are "special circumstances" (that's life with software). When you design the generic solution you'll have specific use cases in mind, and such special cases may not be handled anyway.
In summary: You'll future proof most efficiently via defining the upgrade path for the data model.
A version number is probably required.
But the best thing is to actually make a design for your XML. And make sure that the XML is structured in an intuitive and natural way. Otherwise the current organisation of your code may leak into the structure of the XML, which makes the XML harder to read for future versions of your product.
When saving enumerated values, don't write the numbers, but the name of the enumerable. If some elements could occur multiple times in principle, but not in your current application, design it as an array in XML. Make sure the numbers you write are in a unit that is logical in the problem domain, and not what your application happens to use right now.
In XML written this way, it should not be hard to support legacy versions of your XML.
Edit:
If you make drastic changes, it can be helpful to just implement a legacy data object that reads the legacy xml. Then you write a conversion method to convert from the old data model to the new one. This helps you to a fresh start esp. if the old data model was badly designed.

arbitrary typed data in django model

I have a model, say, Item. I want to store arbitrary amount of attributes on it, like title, description, release_date. And i want them to be not just strings but have python type, so string, boolean, datetime etc.
What are my options here? EAV pattern with separate name-value table won't work because of the same DB type across all values. JSONField can probably help, but it doesn't know about datetime, for example. Also i was looking at PickeField, it fits perfectly, but i'm a bit concerned about performance.
You have a couple of options and none of them are great. Some of them have been discussed before on Stack Overflow.
Firstly, as you suggested, you have the entity-attribute-value design pattern.
You can add DB type checking by having a table for VARCHARs and one for INTs and one for BOOLEANs and so on.
EAV makes selects very painful. You have to query a number of tables to actually get an object and if you have to use values from the EAV table in the lookup you will run into performance issues as the size increases.
In general, however, EAV should really only be used for very sparse data where another option simply does not work.
There is a Django package for this on PyPI, but I haven't used it.
I have seen some pretty large scale commercial products that use this approach when a lot of flexibility is absolutely required
A slightly better approach is to have a table whose schema changes and a metadata table that describes that table. For dense data where most items have most of the attributes, this has a lot of advantages over EAV. This approach is sometimes called dynamic tables or dynamic rows.
INSERTs, UPDATEs and DELETEs are much faster since everything is in 1-2 tables
Type checking and potentially constraints can be added
However, this approach leaves a very complex database that can be harder to work with
I don't know any way that Django would use its ORM with this kind of database since your models would be changing on the fly.
You are altering your database with ALTER TABLE on the fly. You better be very careful with your transactions
A good approach if you don't need to perform lookups based on these dynamic attributes is to store dynamic data in a JSONField or better yet a schema validated XMLField. However, lookups will be painful if you have to lookup based on a dynamic attribute that is part of your JSON or XML.
The best approach depends on how sparse your data is and how you'll be looking up that data. Also, a very good question to ask is if you absolutely need this flexibility. I've worked on some projects where we decided we needed EAV but since the project went into production attributes are rarely added and rarely removed so we got all the disadvantages and none of the boons.