Fill CapnProto List with non-primitive - c++

According to the CapnProto documentation: (NOTE: I am using the C++ version)
For List where Foo is a non-primitive type, the type returned by
operator[] and iterator::operator*() is Foo::Reader (for
List::Reader) or Foo::Builder (for List::Builder). The
builder’s set method takes a Foo::Reader as its second parameter.
While using "set" seems to work fine for non-primitive types:
Other stack overflow question for primitives only
There does not appear to be a "set" function for automatically generated lists of non-primitives. Did my CapnProto generation fail in some way, or is there another method for setting elements in a list of non-primitives?

There is a "set" method, but it is called setWithCaveats():
destListBuilder.setWithCaveats(index, sourceStructReader)
This is to let you know that there are some obscure problems with setting an element of a struct list. The problem stems from the fact that struct lists are not represented as a list of pointers as you might expect, but rather they are a "flattened" series of consecutive structs, all of the same size. This implies that space for all the structs in the list is allocated at the time that you initialize the list. So, when you call setWithCaveats(), the target space is already allocated previously, and you're copying the source struct into that space.
This presents a problem in the face of varying versions: the source struct might have been constructed using a newer version of the protocol, in which additional fields were defined. In this case, it may actually be larger than expected. But, the destination space was already allocated, based on the protocol version you compiled with. So, it's too small! Unfortunately, there's no choice but to discard the newly-defined fields that we don't know about. Hence, data may be lost.
Of course, it may be that in your application, you know that the struct value doesn't come from a newer version, or that you don't care if you lose fields that you don't know about. In this case, setWithCaveats() will do what you want.
If you want to be careful to preserve unknown fields, you may want to look into the method capnp::Orphanage::newOrphanConcat(). This method can concatenate a list of lists of struct readers into a single list in such a way that no data is lost -- the target list is allocated with each struct's size equal to the maximum of all input structs.
auto orphanage = Orphanage::getForMessageContaining(builder);
auto orphan = orphanage.newOrphanConcat({list1Reader, list2Reader});
builder.adoptListField(kj::mv(orphan));

Related

Arrays vs. Doubly Linked Lists for Queue simulation

I am working on an assignment for school simulating a line with students and multiple windows open at the Registrar's Office.
I got the queue for the students down but it was suggested by someone that I use an array for the windows implementing our queue class we made on our own.
I don't understand why an array would work when there are other variables I want to know about each window besides just the student time decrementing.
I'm just looking for some direction or more in depth explanation on how that's possible to use an array to just store the time each student is at the window as opposed to another doubly linked list?
The way I see it you've got a variable number of students and a fixed number of windows (buildings don't usually change all that often). If I were to make a representation of this in code I would use a dynamically sized container (a list, vector, queue, etc.) to contain all the students and a fixed-size array for the registers. This would embody the intent of the real situation in code, making it less likely that someone else using your code makes any mistakes related to the size of the Registrar's Office. Often choosing a container type is all about its intended use!
Thus you can design a class to hold all the registers using a fixed-size array (or even nicer: a template-dictated size seeing as your using C++). Then you can write all your other Registrar-related functions using the given size argument and thus never go out-of-bounds in your Registrar-array.
Lastly: an array holds whatever information you want it to hold. You can have it hold only numbers (like int) but you can also have it hold objects of any type! What I mean to say is: create a Registrar class that holds all the information you want to collect for every individual Registrar. Then create an array that holds Registrar objects. Then whenever you access an individual element in the array you can access all the information of the individual Registrar through that single reference.

C++ Deleting objects from memory

Lets say I have allocated some memory and have filled it with a set of objects of the same type, we'll call these components.
Say one of these components needs to be removed, what is a good way of doing this such that the "hole" created by the component can be tested for and skipped by a loop iterating over the set of objects?
The inverse should also be true, I would like to be able to test for a hole in order to store new components in the space.
I'm thinking menclear & checking for 0...
boost::optional<component> seems to fit your needs exactly. Put those in your storage, whatever that happens to be. For example, with std::vector
// initialize the vector with 100 non-components
std::vector<boost::optional<component>> components(100);
// adding a component at position 15
components[15].reset(component(x,y,z));
// deleting a component at position 82
componetnts[82].reset()
// looping through and checking for existence
for (auto& opt : components)
{
if (opt) // component exists
{
operate_on_component(*opt);
}
else // component does not exist
{
// whatever
}
}
// move components to the front, non-components to the back
std::parition(components.begin(), components.end(),
[](boost::optional<component> const& opt) -> bool { return opt; });
The short answer is it depends on how you store it in memmory.
For example, the ansi standard suggests that vectors be allocated contiguously.
If you can predict the size of the object, you may be able to use a function such as size_of and addressing to be able to predict the location in memory.
Good luck.
There are at least two solutions:
1) mark hole with some flag and then skip it when processing. Benefit: 'deletion' is very fast (only set a flag). If object is not that small even adding a "bool alive" flag can be not so hard to do.
2) move a hole at the end of the pool and replace it with some 'alive' object.
this problem is related to storing and processing particle systems, you could find some suggestions there.
If it is not possible to move the "live" components up, or reorder them such that there is no hole in the middle of the sequence, then the best option if to give the component objects a "deleted" flag/state that can be tested through a member function.
Such a "deleted" state does not cause the object to be removed from memory (that is just not possible in the middle of a larger block), but it does make it possible to mark the spot as not being in use for a component.
When you say you have "allocated some memory" you are likely talking about an array. Arrays are great because they have virtually no overhead and extremely fast access by index. But the bad thing about arrays is that they aren't very friendly for resizing. When you remove an element in the middle, all following elements have to be shifted back by one position.
But fortunately there are other data structures you can use, like a linked list or a binary tree, which allow quick removal of elements. C++ even implements these in the container classes std::list and std::set.
A list is great when you don't know beforehand how many elements you need, because it can shrink and grow dynamically without wasting any memory when you remove or add any elements. Also, adding and removing elements is very fast, no matter if you insert them at the beginning, in the end, or even somewhere in the middle.
A set is great for quick lookup. When you have an object and you want to know if it's already in the set, checking it is very quick. A set also automatically discards duplicates which is really useful in many situations (when you need duplicates, there is the std::multiset). Just like a list it adapts dynamically, but adding new objects isn't as fast as in a list (not as expensive as in an array, though).
Two suggestions:
1) You can use a Linked List to store your components, and then not worry about holes.
Or if you need these holes:
2) You can wrap your component into an object with a pointer to the component like so:
class ComponentWrap : public
{
Component component;
}
and use ComponentWrap.component == null to find if the component is deleted.
Exception way:
3) Put your code in a try catch block in case you hit a null pointer error.

Are recursive types really the only way to build noncontinuous arbitrary-size data structures?

I just noticed a question asking what recursive data types ("self-referential types") would be good for in C++ and I was tempted to boldly claim
It's the only way to construct data structures (more precisely containers) that can accept arbitrary large data collections without using continuous memory areas.
That is, if you had no random-access arrays, you would require some means of references (logically) to a type within that type (obviously, instead of having a MyClass* next member you could say void* next but that would still point to a MyClass object or a derived type).
However, I am careful with absolute statements -- just because I couldn't think of something doesn't mean it's not possible, so am I overlooking something? Are there data structures that are neither organised using mechanisms similar to linked lists / trees nor using continuous sequences exclusively?
Note: This is tagged both c++ and language-agnostic as I'd be interested specifically in the C++ language but also in theoretical aspects.
It's the only way to construct data structures (more precisely containers) that can accept arbitrary large data collections without using continuous memory areas.
After contemplating for a while, this statement seems be correct. It is self-evident, in fact.
Suppose I've a collection of elements in a non-contiguous memory. Also suppose that I'm currently at element e. Now the question is, how would I know the next element in the collection? Is there any way?
Given an element e from a collection, there are only two ways to compute the location of the next element:
If I assume that it is at offset sizeof(e) irrespective of what e is, then it means that the next element starts where the current element ends. But then this implies that the collection is in a contiguous memory, which is forbidden in this discussion.
The element e itself tells us the location of the next element. It may store the address itself, or an offset. Either way, it is using the concept of self-reference, which too is forbidden in this discussion.
As I see it, the underlying idea of both of these approaches is exactly same: they both implement self-reference. The only difference is that in the former, the self-reference is implemented implicitly, using sizeof(e) as offset. This implicit self-reference is supported by the language itself, and implemented by the compiler. In the latter, it is explicit, everything is done by the programmer himself, as now the offset (or pointer) is stored in the element itself.
Hence, I don't see any third approach to implement self-reference. If not self-reference, then what terminology would one use to describe the computation of the location of the next element to an element e.
So my conclusion is, your statement is absolutely correct.
The problem is that the dynamic allocator itself is managing contiguous storage. Think about the "tape" used for a Turing Machine, or the Von Neumann architecture. So to seriously consider the problem, you would likely need to develop a new computing model and new computer architecture.
If you think disregarding the contiguous memory of the underlying machine is okay, I am sure a number of solutions are possible. The first that comes to my mind is that each node of the container is marked with an identifier that has no relation to its position in memory. Then, to find the associated node, all of memory is scanned until the identifier is found. This isn't even particularly inefficient if given enough computing elements in a parallel machine.
Here's a sketch of a proof.
Given that a program must be of finite size, all types defined within the program must contain only finitely many members and reference only finitely many other types. The same holds for any program entrypoint and for any objects defined before program initialisation.
In the absence of contiguous arrays (which are the product of a type with a runtime natural number and are therefore unconstrained in size), all types must be arrived at through the composition of types as above; derivation of types (pointer-to-pointer-to-A) is still constrained by the size of the program. There are no facilities other than contiguous arrays to compose a runtime value with a type.
This is a little contentious; if e.g. mappings are considered primitive then one can approximate an array with a map whose keys are the natural numbers. Of course, any implementation of a map must use self-referential data structures (B-trees) or contiguous arrays (hash tables).
Next, if the types are non-recursive then any chain of types (A references B references C...) must terminate, and can be of no greater length than the number of types defined in the program. Thus the total size of data referenceable by the program is limited to the product of the sizes of each type multiplied by the number of names defined in the program (in its entrypoint and static data).
This holds even if functions are recursive (which strictly speaking breaks the prohibition on recursive types, since functions are types); the amount of data immediately visible at any one point in the program is still limited to the product of the sizes of each type multiplied by the number of names visible at that point.
An exception to this is if you store a "container" in a stack of recursive function calls; however such a program would not be able to traverse its data at random without unwinding the stack and having to reread data, which is something of a disqualification.
Finally, if it is possible to create types dynamically the above proof does not hold; we could for example create a Lisp-style list structure where each cell is of a distinct type: cons<4>('h', cons<3>('e', cons<2>('l', cons<1>('l', cons<0>('o', nil))))). This is not possible in most static-typed languages, although it is possible in some dynamic languages e.g. Python.
The statement is not correct. The simple counter example is std::deque in C++. The basic data structure (for the language-agnostic part) is a contiguous array of pointers to arrays of data. The actual data is stored in ropes (non-contiguous blocks), that are chained through a contiguous array.
This might be bordering your requirements, depending on what without using continuous memory areas mean. I am using the interpretation that the stored data is not contiguous, but this data structure depends on having arrays for the intermediate layer.
I think a better phrasing would be:
It's the only way to construct data structures (more precisely containers) that can accept
arbitrary large data collections without using memory areas of determinable address.
What I mean is that normal arrays use addr(idx)=idx*size+inital_addr to get the memory address of an element. However, if you change that to something like addr(idx)=idx*idx*size+initial_addr then the elements of the data structure are not stored in continuous memory areas, rather, there are large gaps between where elements are stored. Thus, it is not continuous memory.

multimap representation in memory

I'm debugging my code and at one point I have a multimap which contains pairs of a long and a Note object created like this:
void Track::addNote(Note &note) {
long key = note.measureNumber * 1000000 + note.startTime;
this->noteList.insert(make_pair(key, note));
}
I wanted to look if these values are actually inserted in the multi map so I placed a breakpoint and this is what the multimap looks like (in Xcode):
It seems like I can infinitely open the elements (my actual multimap is the first element called noteList) Any ideas if this is normal and why I can't read the actual pair values (the long and the Note)?
libstdc++ implements it's maps and sets using a generic Red/Black tree. The nodes of the tree use a base class _Rb_tree_node_base which contain pointers to the same types for the parent/left/right nodes.
To access the data, it performs a static cast to the node type that's specific to the template arguments you provided. You won't be able to see the data using XCode unless you can force the cast.
It does something similar with linked lists, with a linked list node base.
Edit: It does this to remove the amount of duplicate code that is generated by the template. Rather than have a RbTree<Type1>, RbTree<Type2>, and so on; libstdc++ has a single set of operations that work on the base class, and those operations are the same regardless of the underlying type of the map. It only casts when it needs to examine the data, and the actual rotation/rebalance code is the same for all of the trees.
Seems like a bug in the component that renders the collection. About halfway down the list there is an entry that is 0x00000000, but the rendering continues below that, without any valid pointers though. Perhaps you need to add your own common-sense interpretation of the displayed data and treat a null value as the end of that part of the tree.

size of fields of a c++ struct

I have various c++ struct in my program, I want a function to accept one of this struct's in input and get me in a int array contain size of each fields of input struct in byte. ca any one help me?
That's not possible.1 C++ does not have reflection.
1. To be precise, it's not possible to have this done automatically by the language. You could, of course, keep track of this stuff manually (as in #Nim's suggestion).
Here is an approach:
Use an overloaded function, and in each overload (for each struct), explicitly insert the size of each field from that struct into the passed in array (vector<size_t> is better alternative).
This means effectively you have to hard-code the fields in each structure in each overload.
Alternatively, if you are happy to use boost::fusion, with a few macros, you should be able to promote the structure - which you can then iterate over. I've posted an answer with an example somewhere on SO, will dig it up...
Here it is: Boost MPL to generate code for object serialization?, shows how to "promote" the structure and then iterate over the members. In that case, it's for serialization, but it's trivial to adapt it to return the size of each field. You could do this with MPL at compile time and generate an MPL sequence with the size of each field - but that's a little more tricky - it all depends on what you want to achieve really...