XSLT 3.0 Streaming limitation

XSLT 3.0 Streaming limitation - xslt

I'm currently studying the streaming function in xslt. I'm wondering what are the limitation of it. It seems that it is pretty straight forward transformation, however can it be use to transform a document to another format? like rearranging the position of elements.

Pure streaming is forwards only, node by node, so with pure streaming you could skip nodes, rename/remap them but not rearrange them. On the other hand, you are not restricted to pure streaming, if you have millions of book elements but know you want to e.g. sort the authors/author children of each book you can "materialize" the authors element with copy-of() and do the sorting on the materialized node (in a non-streaming mode). So pure streaming allows forwards only, one downwards selection which is prettly limiting, but your can mix streaming and non-streaming.

Related

How to compute the visual data of a graph?

I have a database with information about many hypothetical people. This data is oriented for creating a graph connecting related people together. This graph is to represent a genealogical tree, in an appropriate c++ data structure. So far, so good. I have my data structure holding information about a family, with each person being a node, all connected appropriately in a tree.
Now, here is the problem, I am lost in how to go about generating visual data for this family graph. For any given family, I need to generate a typical graph as you would see in a traditional genealogical tree. I intend to render the data with OpenGL and have everything set up in order to do it. My only problem is how to generate the correct positions and sizes for each person's rectangle, so in the end there are no overlaps and every generation of people sits at the same vertical position. Then, I have to add the traditional lines connecting each node in visual data, but that shouldnt be a big problem.
Are there any lightweight libraries prepared to do this function or can someone help me achieve the algorithm to solve this? Thanks

I recommend AT&T GraphVis library. This library is used by Doxygen to draw its inheritance diagrams, and call trees.
You could also search for "c++ tree draw".

Another way to state the problem would be: how to determine node layout in a hierarchal directed graph such that there is no overlap and nodes at the same hierarchy are placed at the same level? (Here nodes correspond to family members and generation specifies the hierarchy).
While GraphVis can provide a layout solution, a newer approach than algorithms used in GraphVis is the Dig-CoLa (directed graph layout through constrained energy minimization) method described in Dwyer et al. 2005. The advantage of Dig-Cola is that it also deals with certain corner cases (e.g cycles) gracefully and avoids introducing hierarchy not induced by the original data in such cases.
The underlying idea of this method is to formulate and solve the layout problem as a constrained optimization problem by minimizing the stress (or energy) function based on the node positions under constraints induced by the hierarchy information.
Original Dig-Cola paper for more information:
Dwyer, Tim, and Yehuda Koren. "Dig-CoLa: directed graph layout through constrained energy minimization." IEEE Symposium on Information Visualization, 2005. INFOVIS 2005.. IEEE, 2005.
(Full text currently available here)
The author also provides some examples and code for this method (and later extensions) here.

splitting a boost graph into connected components

My program starts out by creating a graph (~1K-50K vertices) that usually consists of a few hundred connected components.
The program only needs to be able to manipulate and visualize individual components (using force-directed layout algorithm).
It would be great (but not essential) to have the capability of further splitting each connected component into connected subcomponents (by removing edges or vertices).
So my question is, can I use use subgraph or filtered_graph class templates to achieve the required functionality (maintain a collection of component graphs that can be individually manipulated and possibly further subdivided by removing edges/vertices)? Or is there an another, better approach?
I apologize if this question is too basic. I've just started to learn BGL and not comfortable with this library yet. Thanks in advance!

Use connected_components to assign a unique number to each component, storing it in a property of the nodes. Then you can use that property in the filtered_graph predicate to decide whether a given component belongs to the currently active graph or not. The vertex predicate would be straight-forward, whereas the edge predicate could simply look at either endpoint to make its choice. The number of the subgraph would be stored in the predicate objects themselves.
Whether a different approach would be beter depends on your use case. If the components don't change much, and you have to perform a lot of operations which iterate over all nodes, then having separate graph objects might be better. You could create them as copies of filtered graphs constructed as described above. If the graph gets modified a lot, some approach which will not have to scan the whole graph to update connected components would be useful. If no edges get removed, incremental_components might do the trick.

Managing large spatial data set with attributes in C++

I have a data set with about 700 000 entries, and each entry is a set of 3D coordinates with attributes such as name, timestamp, ID, and so on.
Right now I'm just reading the coordinates and render them as points in OpenGL. However I want to associate each point with its corresponding attributes and I want to be able to sort and pick them during runtime based on their attributes. How would I go about to achieve this in an efficient manner?
I know I can put I can put the data in a struct and use stl sort for sorting, but is that a good design choice or is there a more efficient/elegant way of handling the problem?

The way I tend to look at these design choices is to first use one of the standard library containers (btw, if you need to "just" do lookup you don't necessarily have to sort, but you need a container that allows lookup), then check if this an "efficient enough" solution for the problem.
You can usually come up with a custom solution that is more efficient and maybe more elegant but you tend to run into two issues with that:
1) You end up having to implement some type of a container, which will cost you time both in implementation and debugging compared to a well understood and tested container that is already out there. Most of the time you're better off trying to solve the problem at hand rather than make it bigger by adding more code.
2) If someone else will have to maintain your code at some point, chances are they are familiar with standard library components both from a design and implementation perspective, but they won't be familiar with your custom container, thus increasing the learning curve.

If you consider each attribute of your point class as a component of a vector, then your selection process is a region query. Your example of a string attribute being equal to something means that the region is actually a line in your data space. However, there won't be any sorting made on other attributes within that selection, you will have to implement it by yourself, but it should be relatively straightforward for octrees, which partition data in ordered regions.
As advocated in another answer, try existing standard solutions first. If you can find an of the shelf implementation of one of these data structures:
R-tree
KD tree
BSP
Octree, or more likely, a n dimensional version of the quadtree or octree principle (I will use the term octree herein to denote the general data structure)
then go for it. These are the data structures I recommend for spatial data management.
You could also use an embedded RDBMS capable of working with spatial data (they usually implement R-tree for spatial indexing), but it may not be interesting if your dataset isn't dynamic.
If your dataset falls within the 10000 entries range, then by today standards it isn't that large, so using simpler structures should suffice. In that perimeter, I would go first for a simple std::vector, and use std::sort and std::find to filter the data in smaller set and sort it afterward.
I would probably try an ordered set or map on the most queried attribute in a second attempt, then do some benchmarks to pick the more performing solution.
For a more efficient one dimensional indexing algorithm (in essence, that`s what sets and maps are), you might want to try B-trees: there's C++ implementation available from google.
My third attempt would go toward an OpenCL solution (although if you are doing heavy OpenGL rendering, you might prefer doing the work on the CPU instead, but that depends on your framerate needs).
If your dataset is much larger, as it seems to be, then consider one of the more complex solutions I listed initially.
At any rate, without more details about your dataset and how you plan to use it, it will be difficult to provide a good solution, so the only real advice we can give is: try everthing you can and benchmark.

If you're dealing with point clouds, take a look at PCL, it could save you a lot of time and effort without having to dig into the intricacies of spatial indexing yourself. It also includes visualisation.

Search structure with history (persistence)

I need a map-like data structure (in C++) for storing pairs (Key,T) with the following functionality:
You can insert new elements (Key,T) into the current structure
You can search for elements based on Key in the current structure
You can make a "snapshot" of the current version of the structure
You can switch to one of the versions of the structures which you took the snapshot of and continue all operations from there
Completely remove one of the versions
What I don't need
Element removal from the structure
Merging of different versions of the structure into one
Iteration over all (or some of) elements currently stored in the structure
In other words, you have some search structure that you can build up, but at any point you can jump in history, and expand the earlier/different version of the structure in a different way. Later on you may jump between those different versions.
In my project, Key and T are likely to be integers or pointer values, but not strings.
The primary objective is to reduce the time complexity; space consumption is secondary (but should be reasonable as well). To clarify, for me log(N)+log(S) (where N-number of elements, S-number of snapshots) would be enough, although faster is better :)
I have some rough idea how to implement it --- for example: being the structure a binary search tree, the insertion of a new element can clone the path from the root to the insertion location, while keeping the rest of the tree intact. Switching tree versions would be equivalent to picking a different version of the root node, for which some changes are simply not visible.
However, to make this custom tree efficient (e.g. self-balancing) it will require some additional effort and careful coding. Of course I can do it myself but perhaps there are already existing libraries to do exactly that?
Also, there is probably a proper name for this kind of data structure that I simply don't know, making my Google searches (or SO searches) total failures...
Thank you for your help!

I think what you are looking for is an immutable map. Functional (or functionally inspired) programming languages (such as Haskell or Scala) have immutable versions of most of the containers you'd find in the STL. Operations such as insertion/removal etc. then return a copy of the map (preserving the original) with the copy containing your requested modification. A lot of work has gone into designing the datastructures so that the copies are able to point to as much of the original datastructure as possible to reduce time and memory complexity of each operation.
You can find a lot more details in a book such as this one: http://www.amazon.co.uk/Purely-Functional-Structures-Chris-Okasaki/dp/0521663504.

While searching for some persistent search trees libraries I stumbled on this:
http://cg.scs.carleton.ca/~dana/pbst/
While it does not have the exact same functionality as needed, it seems pretty close to it. I will investigate.
(posting here, as someone may find it useful as well)

Binary parser or serialization?

I want to store a graph of different objects for a game, their classes may or may not be related, they may or may not contain vectors of simple structures.
I want parsing operation to be fast, data can be pretty big.
Adding new things should not be hard, and it should not break backward compatibility.
Smaller file size is kind of important
Readability counts
By serialization I mean, making objects serialize themselves, which is effective, but I will need to write different serialization methods for different objects for that.
By binary parsing/composing I mean, creating a new tree of parsers/composers that holds and reads data for these objects, and passing this around to have my objects push/pull their data.
I can also use json, but it can be pretty slow for reading, and it is not very size effective when it comes to pretty big sets of matrices, and numbers.

Point by point:
Fast Parsing: binary (since you don't necessarily have to "parse", you can just deserialize)
Adding New Things: text
Smaller: text (even if gzipped text is larger than binary, it won't be much larger).
Readability: text
So that's three votes for text, one point for binary. Personally, I'd go with text for everything except images (and other data which is "naturally" binary). Then, store everything in a big zip file (I can think of several games do this or something close to it).
Good reads: The Importance of Being Textual and Power Of Plain Text.

Check out protocol buffers from Google or thrift from Apache. Although billed as a way to write wire protocols easily, it's basically an object serialization mechanism that can create bindings in a dozen languages, has efficient binary representation, easy versioning, fast performance, and is well-supported.

We're using Boost.Serialization. Don't know how it performs next to those offered by samkass.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js