I'm working on a somewhat complex mathematical code, written in C++. I'm using (templated) tree structures for adaptive function representation. Due to some mathematical properties I end up in a situation where I need to change from one type of node to another. This needs to happen transparently and with minimal overhead, both in terms of storage and performance, since these structures are used in very heavy computations.
The detailed situation is as follows: I have a templated abstract base class defining general mathematical and structural properties of a general, doubly-linked node. Each node needs information both from it's parent and from a top-level Tree class, in addition to keeping track of it's children. Two classes inherit from this class, the FunctionNode and the GenNode. These classes are very different in terms of storage and functionality, and should not be (at least public) ancestors of each other. Thus, I would like to construct a tree like this:
T
N
/ \
N N
/ \
G N
/ \
G G
Where T is a Tree, N is a normal FunctionNode and G is a GenNode. The problem is the N - G transition: N needs to have children of type G, and G a parent of type N. Since N and G are only cousins and not siblings, I can't convert a N* to a G*. It's sufficient for G to know that N is a BaseNode, but N has to somehow store G polymorphically so that the correct virtuals get called automagically when the tree is traversed. Any ideas how to solve this problem elegantly and efficiently would be much appreciated! :) Of course one could just hack this, but since this is a very fundamental piece of code I would like to have a good solution for it. It's likely that there will be many derivations of this code in the future.
Best regards,
Jonas Juselius
Centre for Theoretical and Computational Chemistry, University of Tromsø
Don't use inheritance when delegation will do. Look at the Strategy design pattern for guidance on this.
The "N - G" transition may be better handled by having a subclass of N (N_g) which is a unary operator (where other N's are binary) and will delegate work to the associated G object. The G subtree is then -- actually -- a disjoint family of classes based on G's, not N's.
T
N
/ \
N N
/ \
N_g N
|
G
/ \
G G
"One of the problems is that I do not know beforehand whether the next N will be N or N_g."
"beforehand?" Before what? If you are creating N's and then trying to decide if they should have been N_g's, you've omitted several things.
You've instantiated the N too early in the process.
You've forgotten to write an N_g constructor that works by copying an N.
You've forgotten to write a replace_N_with_Ng method that "clones" an N to create an N_g, and then replaces the original N in the tree with the N_g.
The point of polymorphism is that you don't need to know "beforehand" what anything is. You should wait as long as possible to create either an N or an N_g and bind the resulting N (or subclass of N) object into the tree.
"Furthermore, sometimes I need to prune all G:s, and generate more N:s, before perhaps generating some more G:s."
Fine. You walk the tree, replacing N_g instances with N instances to "prune". You walk the tree replacing N instances with N_g's to generate a new/different G subtree.
Look into using RTTI - Run-time Type Information.
Have you though of using Boost.Any ?
It seems like the textbook example in my opinion.
Having thought about the problem some more I came up with the following idea:
Logically, but not functionally, GenNode is a-kind-of FunctionNode. If one splits FunctionNode into two classes, one containing the common denominators, and one having the additional functionality only FunctionNode should have, FunctionNode can inherit from that class using private inheritance. Now GenNode can safely inherit from FunctionNode, and all problems can be solved like normal using virtuals. Any comments?
Related
I've been reading a nice answer to Difference between reduce and foldLeft/fold in functional programming (particularly Scala and Scala APIs)? provided by samthebest and I am not sure if I understand all the details:
According to the answer (reduce vs foldLeft):
A big big difference (...) is that reduce should be given a commutative monoid, (...)
This distinction is very important for Big Data / MPP / distributed computing, and the entire reason why reduce even exists.
and
Reduce is defined formally as part of the MapReduce paradigm,
I am not sure how this two statements combine. Can anyone put some light on that?
I tested different collections and I haven't seen performance difference between reduce and foldLeft. It looks like ParSeq is a special case, is that right?
Do we really need order to define fold?
we cannot define fold because chunks do not have an ordering and fold only requires associativity, not commutativity.
Why it couldn't be generalized to unordered collection?
As mentioned in the comments, the term reduce means different thing when used in the context of MapReduce and when used in the context of functional programming.
In MapReduce, the system groups the results of the map function by a given key and then calls the reduce operation to aggregate values for each group (so reduce is called once for each group). You can see it as a function (K, [V]) -> R taking the group key K together with all the values belonging to the group [V] and producing some result.
In functional programming, reduce is a function that aggregates elements of some collection when you give it an operation that can combine two elements. In other words, you define a function (V, V) -> V and the reduce function uses it to aggregate a collection [V] into a single value V.
When you want to add numbers [1,2,3,4] using + as the function, the reduce function can do it in a number of ways:
It can run from the start and calculate ((1+2)+3)+4)
It can also calculate a = 1+2 and b = 3+4 in parallel and then add a+b!
The foldLeft operation is, by definition always proceeding from the left and so it always uses the evaluation strategy of (1). In fact, it also takes an initial value, so it evaluates something more like (((0+1)+2)+3)+4). This makes foldLeft useful for operations where the order matters, but it also means that it cannot be implemented for unordered collections (because you do not know what "left" is).
I am looking for suggestions on what kind of data-structure to use for extremely large structures in OCaml that scale well.
By scales well, I don't want stack overflows, or exponential heap growth, assuming there is enough memory. So this pretty much eliminates the standard lib's List.map function. Speed isn't so much an issue.
But for starters, let's assume I'm operating in the realm of 2^10 - 2^100 items.
There are only three "manipulations" I perform on the structure:
(1) a map function on subsets of the structure, which either increases or decreases the structure
(2) scanning the structure
(3) removal of specific pairs of items in the structure that satisfy a particular criterion
Originally I was using regular lists, which is still highly desirable, because the structure is constantly changing. Usually after all manipulations are performed, the structure has at most either doubled in size (or something thereabouts), or reduced to the empty list []. Perhaps the doubling dooms me from the beginning but it is unavoidable.
In any event, around 2^15 --- 2^40 items start causing severe problems (probably due to the naive list functions I was using as well). The program uses 100% of the cpu, but almost no memory, and generally after a day or two it stack-overflows.
I would prefer to start using more memory, if possible, in order to continue operating in larger spaces.
Anyway, if anyone has any suggestions it would be much appreciated.
If you have enough space, in theory, to contain all items of your data structure, you should look at data structures that have an efficient memory representation, with as few bookeeping as possible. Dynamic arrays (that you resize exponentially when you need more space) are more efficiently stored than list (that pay a full word to store the tail of each cell), so you'd get roughly twice as much elements for the same memory use.
If you cannot hold all elements in memory (this is what your number look like), you should go for a more abstract representation. It's difficult to tell more without more information on what your elements are. But maybe an example of abstract representation would help you devise what you need.
Imagine that I want to record set of integers. I want to make unions, intersections of those sets, and also some more funky operations such as "get all elements that are multiple". I want to be able to do that for really large sets (zillions of distinct integers), and then I want to be able to pick one element, any one, in this set I have built. Instead of trying to store lists of integers, or set of integers, or array of booleans, what I can do is store the logical formulas corresponding to the definition of those sets: a set of integers P is characterized by a formula F such that F(n) ⇔ n∈P. I can therefore define a type of predications (conditions):
type predicate =
| Segment of int * int (* n ∈ [a;b] *)
| Inter of predicate * predicate
| Union of predicate * predicate
| Multiple of int (* n mod a = 0 *)
Storing these formulas requires little memory (proportional to the number of operations I want to apply in total). Building the intersection or the union takes constant time. Then I'll have some work to do to find an element satisfying the formula; basically I'll have to reason about what those formulas mean, get a normal form out of them (they are all of the form "the elements of a finite union of interval satisfying some modulo criterions"), and from there extract some element.
In the general case, when you get a "command" on your data set, such that "add the result of mapping over this subset", you can always, instead of actually evaluating this command, store this as data – the definition of your structure. The more precisely you can describe those commands (eg. you say "map", but storing an (elem -> elem) function will not allow you to reason easily on the result, maybe you can formulate that mapping operation as a concrete combination of operations), the more precisely you will be able to work on them at this abstract level, without actually computing the elements.
I've been searching for a C/C++ library that does symbolic differantation and integrals of polynoms, but haven't found one that suits my needs.
I'm afraid that the problem is that I'm not using the correct terminology.
The problem is this :
given a polynom p, I would like to look at the function
f(p) = integral of (p')^2 from a to b
And generate partial derivatives for f with respect to p's coefficients.
Theoretically, there should be no problem here as we are dealing with polynoms, but I haven't found something that can keep the connection between the original coefficients and the modified polynom.
Does anyone know if there are libraries that can do such things, or am I better of creating my own?
Have you tried to use http://www.fadbad.com/fadbad.html ? It's quite useful.
I would write my own derivative class. There are books available meanwhile which document how to do this. But assuming you know the math rules, it is rather trivial.
Using such a derivative class you can then write a template function to generate your polynomial and the derivative and the square and the integral while keeping track of the derivatives vs. the coefficients. The problem is that you may carry around a lot of derivatives which are always zero. To avoid this is rather complicated.
A normal derivative class would contain a value and an array of derivative values.
There may be a constructor to create an independent variable by value and index -- initializing the value by the passed value and all derivatives to zero except the one matching the index to 1.
Then you write operators and functions for everything you need -- which is not much assuming you're only dealing with polynomials.
I have a list of objects which may or may not be related to each other. Some elements are child items of other elements which in turn may be child of another element. Some may be equal to or totally unrelated to rest of the elements.
For e.g, say the list is {A,B,C,D,E,F} with relations such as A⊂B⊂C, D=E and F≠{A,B,C,D,E,F}. I want to visualize this relationship, perhaps like
-> C
+-B
+-A
-> D
|
-> E
-> F
I just need some guidance to get started, perhaps there is a module to carry out such tasks. The few ways that I could think of, are getting too complicated & intimidating for my nascent scripting skills. Hope someone could help me here.
There are several tree modules on CPAN. Tree, Tree::DAG_Node and Tree::Simple all look like they can do what you want.
i am making a geometry library and i am confused, what should be the return type of a function which calculates the intersection of a segment with another segment. The returned value would sometimes be a point and sometimes be a segment (overlap case) and sometimes be an empty set. As per my thinking there could be 3 ways to tackle this as per follows:
1. return a union (segment, null, point)
2. return a segment with first point == second point when intersection is a single point and both points as NAN when intersection is an empty set
3. return a vector (with 0 elements for empty set, 1 element for pnt and 2 elements for segment)
Please let me know if there are any alternates possible and what are the pros and cons of each of the design. Moreover which design is supposed to be a good design and why. I am interested in making a robust architecture which lets single pipe-lining and thus almost no rewriting of code as well as scalable (in terms of adding functionality and handling all the edge cases)
Following is my code for reference (whose return type is vector)
vector<pnt> seg::inter(seg z){
vector<pnt> ans;
if(p1==p2){if(z.doesinter(p1)){ans.pb(p1);}}
else if(z.p1==z.p2){
if(doesinter(z.p1)) ans.pb(z.p1);}
else{
pnt p1p2=(p2-p1);
pnt q1=p1p2*pnt(0,1);
long double h1,h2;
if(abs((z.p2-z.p1).dot(q1))<=eps){
pnt r1((z.p1-p1)/(p2-p1)),r2((z.p2-p1)/(p2-p1));
if(abs(r1.y)<=eps){//colinear case
h1=r1.x;
h2=r2.x;
if(h1>h2)swap(h1,h2);
if(h2>=0&&h1<=1){//add eps
h1=max(0.0L,h1);h2=min(1.0L,h2);
ans.pb(p1+p1p2*h1);
if(doublecompare(h1,h2)==-1)ans.pb(p1+p1p2*h2);}}}
else{
h1 = ((p1-z.p1).dot(q1))/((z.p2-z.p1).dot(q1));
pnt q2 = (z.p2-z.p1)*pnt(0,1);
h2 = ((z.p1-p1).dot(q2))/((p2-p1).dot(q2));
if(h1+eps>=0&&h1-eps<=1&&h2+eps>=0&&h2-eps<=1) ans.pb(z.p1+(z.p2-z.p1)*h1);}}
return ans;}
My suggestion is to create a specialized Intersection class that can handle all the cases.
You can return an instance of that class then. Internally the class could have for example the vector representation (with same endpoints if the intersection is one point, as you suggested) and could have methods for determining which case it is actually (bool isIntersecting(), isSegment(), etc,).
For a more sophisticated design, you can make this Intersection class abstract and provide specialized implementations for NoIntersection, PointIntersection and SegmentIntersection with different inner data representation.
The union "idea" is perfectly fine, it naturally express all cases. However I would recommend not using the union of the C language directly, because it's a low-level construct which will expose you to hard to find bugs.
Instead, you should use Boost.Variant.
Basically a variant is a combination of 2 elements: a tag and a union, the tag being used to tell which member of the union is in used at a given moment. Being a C++ class, it is C++ aware (not like union) and so you won't be facing restrictions on the type of objects you can put in and you won't be facing undefined behavior either.
typedef boost::Variant<NoneType, Point, Segment> IntersectionType;
Of course, you can also decide to wrap this in a class, to expose a richer interface.
In Modern C++ Design Alexandrescu's explain the multi methods using as example exactly your problem.
You should have a look.
http://loki-lib.sourceforge.net/index.php?n=Idioms.MultipleDispatcher