RESTFful/Resource Oriented Design - web-services

Suppose I have a three level hierarchy consisting of school, students, and classes.
If I expose student as a resource, my question is whether I should always return the parent "school" and the children "classes" along with that student, or whether there should be parm that the user includes to indicate such. Perhaps something like &deep=True?
Or on the other hand, if a user gets a student, and he wants the school, he has to do a GET on the school resource, and likewise if he wants all the classes that a student is taking, he has to do a GET on the classes resource?
I'm trying to keep the design somewhat open for the unknown future user, rather than coding just for what our current requirements demand.
Thanks,
Neal Walters

If you think about Resource design more in the way you think about UI design then the problem becomes easier. There is no reason why you cannot return a subset of school information within the representation of the Student resource and also return a link to a complete representation of School resource in case the user wishes to see more.
I find it useful to think of a REST interface more like a user interface for machines instead of a data access layer. With this mindset it is not a problem to duplicate information in different resource representations.
I know there are lots of people trying to treat REST like a DAL but they are the same people that get upset when they find out that you can't do transactions via a RESTful interface.
Put another way, design your API as you would design a website (but without any of the pretty stuff) and then build a client that can crawl the site for the information it needs.

I think you should avoid thinking of classes as a sub-resource or attribute of a student. An academic class is more than just a time slot on a student's schedule; it has an instructor, a syllabus, etc., all of which may need to be encoded at some point.
As I see it, the following relations hold:
schools have zero or more students
schools have zero or more classes
students have zero or more classes
classes have zero or more students
(You could also trivially extend these with teachers/instructors, if your requirements included such information.)
In addition, each of the above resource types will have any number of attributes beyond the simple links between them.
Given that, I would think you'd want a URL structure something like the following:
http://example.com/lms/schools => list of schools
http://example.com/lms/schools/{school} => info about one school
http://example.com/lms/schools/{school}/students => list of students
http://example.com/lms/schools/{school}/students/{student} => info on one student
http://example.com/lms/schools/{school}/students/{student}/courses => list of courses (as links, not full resources) student is enrolled in
http://example.com/lms/schools/{school}/courses => list of courses
http://example.com/lms/schools/{school}/courses/{course} => info on one course
http://example.com/lms/schools/{school}/courses/{course}/students => list of students (as links, not full resources) enrolled in course

I suppose that adding query parameters to optimize delivery is reasonable. I might make it even more generic and use include=<relation>. This can be extended for all types. Note that
you can use multiple includes: .../student/<id>?include=school&include=student will assign the list [school, student] to the parameter include. This will also allow a general pattern that may be possibly useful for the other resources as well.

Related

Text Adventure Game - How to Tell One Item Type from Another and How to Structure the Item Classes/Subclasses?

I'm a beginner programmer (who has a bunch of design-related scripting experience for video games but very little programming experience - so just basic stuff like loops, flow control, etc. - although I do have a C++ fundamentals and C++ data structures and algorithm's course under my belt). I'm working on a text-adventure personal project (I actually already wrote it in Python ages ago before I learned how classes work - everything is a dictionary - so it's shameful). I'm "remaking" it in C++ with classes to get out of the rut of having only done homework assignments.
I've written my player and room classes (which were simple since I only need one class for each). I'm onto item classes (an item being anything in a room, such as a torch, a fire, a sign, a container, etc.). I'm unsure how to approach the item base class and derived classes. Here are the problems I'm having.
How do I tell whether an item is of a certain type in a non-shit way (there's a good chance I'm overthinking this)?
For example, I set up my print room info function so that in addition to whatever else it might do, it prints the name of every object in its inventory (i.e. inside of it) and I want it to print something special for a container object (the contents of its inventory for example).
The first part's easy because every item has a name since the name attribute is part of the base item class. The container has an inventory though, which is an attribute unique to the container subclass.
It's my understanding that it's bad form to execute conditional logic based on the object's class type (because one's classes should be polymorphic) and I'm assuming (perhaps incorrectly) that it'd be weird and wrong to put a getHasInventory accessor virtual function in the item base class (my assumption here is based on thinking it'd be crazy to put virtual functions for every derived class in the base class - I have about a dozen derived classes - a couple of which are derived classes of derived classes).
If that's all correct, what's an acceptable way to do this? One obvious thing is to add an itemType attribute to the base and then do conditional logic but this strikes me as wrong since it seems to just be a re-skinning of the checking class type solution. I'm unsure whether the above-mentioned assumptions are correct and what a good solution might be.
How should I structure my base class/classes and my derived classes?
I originally wrote them such that the item class was the base class and most other classes used single inheritance (except for a couple which had multi-level).
This seemed to present some awkwardness and repeating myself though. For example, I want a sign and a letter. A sign is a Readable Item > Untakeable Item > Item. A letter is a Readable Item > Takeable Item > Item. Because they all use single inheritance I need two different Readable Items, one that's takeable and one that's not (I know I could just make takeable and untakeable into attributes of the base in this instance and I did but this works as an example because I still have similar issues with other classes).
That seems icky to me so I took another stab at it and implemented them all using multiple inheritance & virtual inheritance. In my case that seems more flexible because I can compose classes of multiple classes and create a kind of component system for my classes.
Is one of these ways better than the other? Is there some third way that's better?
One possible way to solve your problem is polymorphism. By using polymorphism you can (for example) have a single describe function which when invoked leads the item to describe itself to the player. You can do the same for use, and other common verbs.
Another way is to implement a more advanced input parser, which can recognize objects and pass on the verbs to some (polymorphic) function of the items for themselves to handle. For example each item could have a function returning a list of available verbs, together with a function returning a list of "names" for the items:
struct item
{
// Return a list of verbs this item reacts to
virtual std::vector<std::string> get_verbs() = 0;
// Return a list of name aliases for this item
virtual std::vector<std::string> get_names() = 0;
// Describe this items to the player
virtual void describe(player*) = 0;
// Perform a specific verb, input is the full input line
virtual void perform_verb(std::string verb, std::string input) = 0;
};
class base_torch : public item
{
public:
std::vector<std::string> get_verbs() override
{
return { "light", "extinguish" };
}
// Return true if the torch is lit, false otherwise
bool is_lit();
void perform_verb(std::string verb, std::string) override
{
if (verb == "light")
{
// TODO: Make the torch "lit"
}
else
{
// TODO: Make the torch "extinguished"
}
}
};
class long_brown_torch : public base_torch
{
std::vector<std::string> get_names() override
{
return { "long brown torch", "long torch", "brown torch", "torch" };
}
void describe(player* p) override
{
p->write("This is a long brown torch.");
if (is_lit())
p->write("The torch is burning.");
}
};
Then if the player input e.g. light brown torch the parser looks through all available items (the ones in the players inventory followed by the items in the room), get each items name-list (call the items get_names() function) and compare it to the brown torch. If a match is found the parser calls the items perform_verb function passing the appropriate arguments (item->perform_verb("light", "light brown torch")).
You can even modify the parser (and the items) to handle adjectives separately, or even articles like the, or save the last used item so it can be referenced by using it.
Constructing the different rooms and items is tedious but still trivial once a good design has been made (and you really should spend some time creating requirement, analysis of the requirements, and creating a design). The really hard part is writing a decent parser.
Note that this is only two possible ways to handle items and verbs in such a game. There are many other ways, to many to list them all.
You are asking some excellent questions reg. how to design, structure and implement the program, as well as how to model the problem domain.
OOP, 'methods' and approaches
The questions you ask indicate that you have learned about OOP (object-oriented programming). In a lot of introductory material on OOP, it is common to encourage modelling the problem domain directly through objects and subtyping and implementing functionality by adding methods to them. A classical example is modelling animals, with for instance an Animal type and two sub-types, Duck and Cat, and implementing functionality, for instance walk, quack and mew.
Modelling the problem domain directly with objects and subtyping can make sense, but it can also very much be overkill and bothersome compared to simply having a single or a few types with different fields describing what it is. In your case, I do believe a more complex modelling like you have with objects and subtypes or alternative approaches can make sense, since among other aspects you have functionality that varies depending on the type as well as somewhat complex data (like a container with an inventory). But it is something to keep in mind - there are different trade-offs, and sometimes, having a single type with multiple different fields for modelling the domain can make more sense overall.
Implementing the desired functionality through methods on a base class and subtypes likewise have different trade-offs, and it is not always a good approach for the given case. For one of your questions, you could do something like adding a print method or similar to the base type and each subtype, but this is not always that nice in practice (a simple example is that of a calculator application where simplifying the arithmetic expression the user enters (like (3*x)*4/2) might be bothersome to implement if one uses the approach of adding methods to the base class).
Alternative approach - Tagged unions/sum types
There is a very nice fundamental abstraction known as "tagged union" (it is also known by the names "disjoint union" and "sum type"). The main idea about the tagged union is that you have a union of several different sets of instances, where which set the given instance belongs to matters. They are a superset of the feature in C++ known as enum. Regrettably, C++ does not currently support tagged unions, though there are research into it (for instance https://www.stroustrup.com/OpenPatternMatching.pdf , though this may be somewhat beyond you if you are a beginner programmer). As far as I can see, this fits very well with the example you have given here. An example in Scala would be (many other languages support tagged unions as well, such as Rust, Kotlin, Typescript, the ML-languages, Haskell, etc.):
sealed trait Item {
val name: String
}
case class Book(val name: String) extends Item
case object Fire extends Item {
val name = "Fire"
}
case class Container(val name: String, val inventory: List[Item]) extends Item
This describes your different kinds of items very well as far as I can see. Do note that Scala is a bit special in this regard, since it implements tagged unions through subtyping.
If you then wanted to implement some print functionality, you could then use "pattern matching" to match which item you have and do functionality specific to that item. In languages that support pattern matching, this is convenient and non-fragile, since the pattern matching checks that you have covered each possible case (similar to switch in C++ over enums checking that you have covered each possible case). For instance in Scala:
def getDescription(item: Item): String = {
item match {
case Book(_) | Fire => item.name
case Container(name, inventory) =>
name + " contains: (" +
inventory
.map(getDescription(_))
.mkString(", ") +
")"
}
}
val description = getDescription(
Container("Bag", List(Book("On Spelunking"), Fire))
)
println(description)
You can copy-paste the two snippets in here and try to run them: https://scalafiddle.io/ .
This kind of modelling works very well with what one might call "data types", where you have no or very little functionality in the classes themselves, and where the fields inside the classes basically are part of their interface ("interface" in the sense that you would like to change the implementations that uses the types if you ever add to, remove or change the fields of the types).
Conversely, I find a more conventional subtyping modelling and approach more convenient when the implementation inside of a class is not part of its interface, for instance if I have a base type that describes a collision system interface, and each of its subtypes have different performance characteristics, handy for different situations. Hiding and protecting the implementation since it is not part of the interface makes a lot of sense and fits very well with what one might call "mini-modules".
In C++ (and C), sometimes people do use tagged unions despite the lack of language support, in various ways. One way that I have seen being used in C is to make a C union (though do be careful reg. aspects such as memory and semantics) where an enum tag was used to differentiate between the different cases. This is error-prone, since you might easily end up accessing a field in one enum case that is not valid for that enum case.
You could also model your command input as a tagged union. That said, parsing can be somewhat challenging, and parsing libraries may be a bit involved if you are a beginner programmer; keeping the parsing somewhat simple might be a good idea.
Side-notes
C++ is a special languages - I do not quite like it for cases where I do not care much about resource usage or runtime performance and the like for multiple different reasons, since it can be annoying and not that flexible to develop in. And it can be challenging to develop in, because you must always take great care to avoid undefined behaviour. That said, if resource usage or runtime performance do matter, C++ can, depending on case, be a very good option. There are also a number of very useful and important insights in the C++ language and its community, such as RAII, ownership and lifetimes. My recommendation is that learning C++ is a good idea, but that you should also learn other languages, maybe for instance a statically-typed functional programming language. FP (functional programming) and languages supporting FP, has a number of advantages and drawbacks, but some of their advantages are very, very nice, especially reg. immutability as well as side-effects.
Of these languages, Rust may be the closest to C++ in certain regards, though I don't have experience with Rust and cannot therefore vouch for either the language or its community.
As a side-note, you may be interested in this Wikipedia-page: https://en.wikipedia.org/wiki/Expression_problem .

How to use Single-Responsibility Principle

I am trying to understand design principles in c++.
In database I have users.
users:
id
name
age
gender
I want to get my users in three ways.
First: I want all my users.
Second: I want all my users filtered by age.
Third: I want all my users filtered by age and gender.
For example, if I use same class for getAllUsers and getFilteredByAge, it means that my class has two responsibility, it is responsible for getting users and also for filtering them. Am I right or not? And how Single-Responsibility Principle works in this example, should I split this three in different classes, or is there any better way ?
A good definition for SRP is:
A module should be responsible to one, and only one, actor.
(Clean Architecture)
This means that if the person telling you what these functions do is the same, then you can leave them in the same class/module.
If, let's say, getAllUsers() is requested by accounting and getUserAtLeastThisOld(int minimumAge) is requested by HR, then it might be sensible to have them in separate classes.
Following are answers to your question
Q] If I use same class for getAllUsers and getFilteredByAge, it means that my class has two responsibility?
A] No,because your class's job is to get users, rather these functions should be overloads and should not be in different classes.
Q] it is responsible for getting users and also for filtering them. Am I right or not?
A] I guess No!, filtering is not a different task, it is something that should be applied before retrieving objects.
Q] how Single-Responsibility Principle works in this example, should I split this three in different classes, or is there any better way ?
A]
In this case I suggest you to have only one class, which should have following functions overloads
GetUsers() - get all users
GetUsers(AgeFilter) - get users as per age filter
GetUsers(AgeFilter, genderFilter) - get users as per age filter and
gender filter
Note : Now suppose in future you have want to add more functionality to this class
for e.g compute salary for user, or adding family details for users
then in such case you can go for creating another class instead of putting burden on single class...

Progressive Disclosure in C++ API

Following my reading of the article Programmers Are People Too by Ken Arnold, I have been trying to implement the idea of progressive disclosure in a minimal C++ API, to understand how it could be done at a larger scale.
Progressive disclosure refers to the idea of "splitting" an API into categories that will be disclosed to the user of an API only upon request. For example, an API can be split into two categories: a base category what is (accessible to the user by default) for methods which are often needed and easy to use and a extended category for expert level services.
I have found only one example on the web of such an implementation: the db4o library (in Java), but I do not really understand their strategy. For example, if we take a look at ObjectServer, it is declared as an interface, just like its extended class ExtObjectServer. Then an implementing ObjectServerImpl class, inheriting from both these interfaces is defined and all methods from both interfaces are implemented there.
This supposedly allows code such as:
public void test() throws IOException {
final String user = "hohohi";
final String password = "hohoho";
ObjectServer server = clientServerFixture().server();
server.grantAccess(user, password);
ObjectContainer con = openClient(user, password);
Assert.isNotNull(con);
con.close();
server.ext().revokeAccess(user); // How does this limit the scope to
// expert level methods only since it
// inherits from ObjectServer?
// ...
});
My knowledge of Java is not that good, but it seems my misunderstanding of how this work is at an higher level.
Thanks for your help!
Java and C++ are both statically typed, so what you can do with an object depends not so much on its actual dynamic type, but on the type through which you're accessing it.
In the example you've shown, you'll notice that the variable server is of type ObjectServer. This means that when going through server, you can only access ObjectServer methods. Even if the object happens to be of a type which has other methods (which is the case in your case and its ObjectServerImpl type), you have no way of directly accessing methods other than ObjectServer ones.
To access other methods, you need to get hold of the object through different type. This could be done with a cast, or with an explicit accessor such as your ext(). a.ext() returns a, but as a different type (ExtObjectServer), giving you access to different methods of a.
Your question also asks how is server.ext() limited to expert methods when ExtObjectServer extends ObjectServer. The answer is: it is not, but that is correct. It should not be limited like this. The goal is not to provide only the expert functions. If that was the case, then client code which needs to use both normal and expert functions would need to take two references to the object, just differently typed. There's no advantage to be gained from this.
The goal of progressive disclosure is to hide the expert stuff until it's explicitly requested. Once you ask for it, you've already seen the basic stuff, so why hide it from you?

Architecture: Modifying the model in different ways

Problem statement
I have a model class that looks something like (extremely simplified; some members and many, many methods omitted for clarity):
class MyModelItem
{
public:
enum ItemState
{
State1,
State2
};
QString text() const;
ItemState state() const;
private:
QString _text;
ItemState _state;
}
It is a core element of the application and is used in many different parts of the code:
It is serialized/deserialized into/from various file formats
It can be written into or read from a database
It can be updated by an 'import', that reads a file and applies changes to the currently loaded in-memory model
It can be updated by the user through various GUI functions
The problem is, this class is has grown over the years and now has several thousands lines of code; it has become a prime example of how to violate the Single responsibility principle.
It has methods for setting the 'text', 'state', etc. directly (after deserialization) and the same set of methods for setting them from within the UI, which has side effects like updating the 'lastChangedDate' and 'lastChangedUser' etc. Some methods or groups of methods exist even more than twice, with everyone of them doing basically the same thing but slightly different.
When developing new parts of the application, you are very likely using the wrong of the five different ways of manipulating MyModelItem, which makes it extremely time consuming and frustrating.
Requirements
Given this historically grown and overly complex class, the goal is to separate all different concerns of it into different classes, leaving only the core data members in it.
Ideally, I would prefer a solution where a MyModelItem object has nothing but const members for accessing the data and modifications can only be made using special classes.
Every one of these special classes could then contain an actual concrete implementation of the business logic (a setter of 'text' could do something like "if the text to be set begins with a certain substring and the state equals 'State1', set it to 'State2'").
First part of the solution
For loading and storing the whole model, which consists of many MyModelItem objects and some more, the Visitor pattern looks like a promising solution. I could implement several visitor classes for different file formats or database schemas and have a save and load method in MyModelItem, which accept such a visitor object each.
Open question
When the user enters a specific text, I want to validate that input. The same validation must be made if the input comes from another part of the application, which means I can not move the validation into the UI (in any case, UI-only-validation is often a bad idea). But if the validation happens in the MyModelItem itself, I have two problems again:
The separation of concerns, which was the goal to begin with is negated. All the business logic code is still 'dumped' into the poor model.
When called by other parts of the application, this validation has to look differently. Implementing different validating-setter-methods is how it is done right now, which has a bad code smell.
It is clear now that the validation has to be moved outside both the UI and the model, into some sort of controller (in a MVC sense) class or collection of classes. These should then decorate/visit/etc the actual dumb model class with its data.
Which software design pattern fits best to the described case, to allow for different ways of modifying the instances of my class?
I am asking, because none of the patterns I know solves my problem entirely and I feel like I'm missing something here...
Thanks a lot for your ideas!
Plain strategy pattern seems the best strategy to me.
What I understand from your statement is that:
The model is mutable.
the mutation may happen through different source. (ie. different classes)
the model must validate each mutation effort.
Depending on the source of an effort the validation process differs.
the model is oblivious of the source and the process. its prime concern is the state of object it is modeling.
Proposal:
let the Source be the classes which somehow mutate the model. it may be the deserializers, the UI, the importers etc.
let a validator be an interface/super-class which holds a basic logic of validation. it can have methods like : validateText(String), validateState(ItemState)...
Every Source has-a validator. That validator may be an instance of the base-validator or may inherit and override some of its methods.
Every validator has-a reference to the model.
A source first sets its own validator then takes the mutation attempt.
now,
Source1 Model Validator
| setText("aaa") | |
|----------------------->| validateText("aaa") |
| |----------------------->|
| | |
| | setState(2) |
| true |<-----------------------|
|<-----------------------| |
the behavior of different validators might be different.
Although you don't state it explicitly, refactoring thousands of lines of code is a daunting task, and I imagine that some incremental process is preferred over an all-or-nothing one.
Furthermore, the compiler should help as much as possible to detect errors. If it is a lot of work and frustration now to figure out which methods should be called, it will be even worse if the API has been made uniform.
Therefore, I would propose to use the Facade pattern, mostly for this reason:
wrap a poorly designed collection of APIs with a single well-designed API (as per task needs)
Because that is basically what you have: a collection of APIs in one class, that needs to be separated into different groups. Each group would get its own Facade, with its own calls. So the current MyModelItem, with all its carefully crafted different method invocations over the years:
...
void setText(String s);
void setTextGUI(String s); // different name
void setText(int handler, String s); // overloading
void setTextAsUnmentionedSideEffect(int state);
...
becomes:
class FacadeInternal {
setText(String s);
}
class FacadeGUI {
setTextGUI(String s);
}
class FacadeImport {
setText(int handler, String s);
}
class FacadeSideEffects {
setTextAsUnmentionedSideEffect(int state);
}
If we remove the current members in MyModelItem to MyModelItemData, then we get:
class MyModelItem {
MyModelItemData data;
FacadeGUI& getFacade(GUI client) { return FacadeGUI::getInstance(data); }
FacadeImport& getFacade(Importer client) { return FacadeImport::getInstance(data); }
}
GUI::setText(MyModelItem& item, String s) {
//item.setTextGUI(s);
item.getFacade(this).setTextGUI(s);
}
Of course, implementation variants exist here. It could equally well be:
GUI::setText(MyModelItem& item, String s) {
myFacade.setTextGUI(item, s);
}
That is more dependent on restrictions on memory, object creation, concurrency, etc. The point is that up till now, it is all straight forward (I won't say search-and-replace), and the compiler helps every step of the way to catch errors.
The nice thing about the Facade is that it can form an interface to multiple libraries/classes. After splitting things up, the business rules are all in several Facades, but you can refactor them further:
class FacadeGUI {
MyModelItemData data;
GUIValidator validator;
GUIDependentData guiData;
setTextGUI(String s) {
if (validator.validate(data, s)) {
guiData.update(withSomething)
data.setText(s);
}
}
}
and the GUI code won't have to be changed one bit.
After all that you might choose to normalize the Facades, so that they all have the same method names. It isn't necessary, though, and for clarity's sake it might even be better to keep the names distinct. Regardless, once again the compiler will help validate any refactoring.
(I know I stress the compiler bit a lot, but in my experience, once everything has the same name, and works through one or more layers of indirection, it becomes a pain to find out where and when something is actually going wrong.)
Anyway, this is how I would do it, as it allows for splitting up large chunks of code fairly quickly, in a controlled manner, without having to think too much. It provides a nice stepping stone for further tweaking. I guess that at some point the MyModelItem class should be renamed to MyModelItemMediator.
Good luck with your project.
If I understand your problem correctly, then would I not decide yet which design pattern to chose. I think that I have seen code like this several times before and the main problem in my point of view was always that change was build upon change build upon change.
The class had lost is original purpose and was now serving multiple purposes, which were all not clearly defined and set. The result is a big class (or a big database, spaghetti code etc), which seems to be indispensable yet is a nightmare for maintenance.
The big class is the symptom of a process that is gone out of control. It is where you can see it happen, but my guess is that when this class has been recovered that a lot of other classes will be the first to redesign. If I am correct, then is there also a lot of data corruption, because in a lot of cases is the definition of the data unclear.
My advice would be go back to your customer, talk about the business processes, reorganize the project management of the application and try to find out if the application is still serving the business process well. It might be not - I have been in this type of situation several times in different organizations.
If the business process is understood and the data model is converted in line with the new data model, then can you replace the application with a new design, which is much easier to create. The big class that now exists, does not have to be reorganized anymore, because its reason for existence is gone.
It costs money, but the maintenance now costs also money. A good indication for redesign is if new features are not implemented anymore, because it has become too expensive or error prone to execute.
I will try to give you a different perspective of the situation you have. Please note that explanations are written in my own words for the simplicity's sake. However, terms mentioned are from the enterprise application architecture patterns.
You are designing the business logic of the application. So, MyModelItem must be some kind of a business entity. I would say it's the Active Record you have.
Active Record: business entity that can CRUD itself, and can manage
the business logic related to itself.
The business logic contained in the Active Record has increased and has become hard to manage. That's very typical situation with Active Records. This is where you must switch from the Active Record pattern to the Data Mapper pattern.
Data Mapper: mechanism (typically a class) managing the mapping (typically between the entity and the data it translates from/to). It
starts existing when the mapping concerns of the Active Record are so
mature that they need to be put into the separate class. Mapping becomes a logic on its own.
So, here we came to the obvious solution: create a Data Mapper for the MyModelItem entity. Simplify the entity so that it does not handle the mapping of itself. Migrate the mapping management to the Data Mapper.
If the MyModelItem takes part in the inheritance, consider creating an abstract Data Mapper and concrete Data Mappers for each concrete class you want to map in a different way.
Several notes on how I would implement it:
Make entity aware of a mapper.
Mapper is a finder of the entity, so the application always starts from the mapper.
Entity should expose the functionality that is natural to be found on it.
And entity makes use of (abstract or concrete) mapper for doing the concrete things.
In general, you must model your application without the data in mind. Then, design mapper to manage the transformations from objects to the data and vice verca.
Now about validation
If the validation is the same in all the cases, then implement it in the entity, as that sounds natural to me. In most cases, this approach is sufficient.
If the validation differs and depends on something, abstract that something away and call the validation through the abstraction. One way (if it depends on the inheritance) would be to put the validation in the mapper, or have it in the same family of objects as mapper, created by the common Abstract Factory.

Most efficient way to add data to an instance

I have a class, let's say Person, which is managed by another class/module, let's say PersonPool.
I have another module in my application, let's say module M, that wants to associate information with a person, in the most efficient way. I considered the following alternatives:
Add a data member to Person, which is accessed by the other part of the application. Advantage is that it is probably the fastest way. Disadvantage is that this is quite invasive. Person doesn't need to know anything about this extra data, and if I want to shield this data member from other modules, I need to make it private and make module M a friend, which I don't like.
Add a 'generic' property bag to Person, in which other modules can add additional properties. Advantage is that it's not invasive (besides having the property bag), and it's easy to add 'properties' by other modules as well. Disadvantage is that it is much slower than simply getting the value directly from Person.
Use a map/hashmap in module M, which maps the Person (pointer, id) to the value we want to store. This looks like the best solution in terms of separation of data, but again is much slower.
Give each person a unique number and make sure that no two persons ever get the same number during history (I don't even want to have these persons reuse a number, because then data of an old person may be mixed up with the data of a new person). Then the external module can simply use a vector to map the person's unique number to the specific data. Advantage is that we don't invade the Person class with data it doesn't need to know of (except his unique nubmer), and that we have a quick way of getting the data specifically for module M from the vector. Disadvantage is that the vector may become really big if lots of persons are deleted and created (because we don't want to reuse the unique number).
In the last alternative, the problem could be solved by using a sparse vector, but I don't know if there are very efficient implementations of a sparse vector (faster than a map/hashmap).
Are there other ways of getting this done?
Or is there an efficient sparse vector that might solve the memory problem of the last alternative?
I would time the solution with map/hashmap and go with it if it performs good enough. Otherwise you have no choice but add those properties to the class as this is the most efficient way.
Alternatively, you can create a subclass of Person, basically forward all the interface methods to the original class but add all the properties you want and just change original Person to your own modified one during some of the calls to M.
This way module M will see the subclass and all the properties it needs but all other modules would think of it as just an instance of Person class and will not be able to see your custom properties.
The first and third are reasonably common techniques. The second is how dynamic programming languages such as Python and Javascript implement member data for objects, so do not dismiss it out of hand as impossibly slow. The fourth is in the same ballpark as how relational databases work. It is possible, but difficult, to make relational databases run the like the clappers.
In short, you've described 4 widely used techniques. The only way to rule any of them out is with details specific to your problem (required performance, number of Persons, number of properties, number of modules in your code that will want to do this, etc), and corresponding measurements.
Another possibility is for module M to define a class which inherits from Person, and adds extra data members. The principle here is that M's idea of a person differs from Person's idea of a person, so describe M's idea as a class. Of course this only works if all other modules operating on the same Person objects are doing so via polymorphism, and furthermore if M can be made responsible for creating the objects (perhaps via dependency injection of a factory). That's quite a big "if". An even bigger one, if nothing other than M needs to do anything life-cycle-ish with the objects, then you may be able to use composition or private inheritance in preference to public inheritance. But none of it is any use if module N is going to create a collection of Persons, and then module M wants to attach extra data to them.