Assume I want to implement class A which must load its "configuration" from a file. And let's assume the "configuration" is a simple map<string, string>.
I can implement the A::LoadConfiguration in two different ways:
void A::LoadConfiguration(string filename)
map<string, string> A::LoadConfiguration(string filename) const
Should I prefer either of the two implementations, and why?
If you prefer the second version when the user wants to get info on a file they will base all their algorithms on the map. If you do the second version, meaning the implementation may be a map, but doesn't have to be, they can base their code around an API which does not have to change even if the internal implementation does.
Consider the situation where later you realize it is far more efficient to use an std array, for whatever reason, now every program using this code has to change many of it's algorithms. Using the first version the change to array can be handled internally and reflect no changes on the outside.
Now if you are planning to make multiple instances of the class you will definitely want to make it a static method because you don't want the file to load every time you call the constructor (especially if the file will not change).
Completely ignoring your suggestions, but this is probably how I would do it (not knowing all your constraints, so ignore me if it does not fit):
class A
{
public:
static A fromConfiguration( string fileName );
/* ... */
}
In most cases, the "configuration" of a class should be set at object creation, so forcing the user to provide it on construction is a good thing (instead of having to remember to do do the loading later).
namespace NeatStuff
{
map<string,string> loadSimpleConfiguration( string fileName );
}
If the configuration file format is really simple (and not specific to your class) you can move the actual loading out of the class.
Assuming other classes use the configuration later, I prefer option 1, and an additional GetConfigurationParameter public const method that gets the config value for a particular key. That lets me make other classes which can just ask for some parameter by name without ever caring that it's implemented as a map.
Another reason why I prefer option 1 is that loading a configuration should be distinct from returning it. If I see a name like LoadConfiguration, I assume that it loads the config from somewhere and sets the parameters in the class. I do not assume it returns some description of the configuration, which I'd instead expect from a method like GetConfiguration - but opinions on this will vary for different people of course.
Related
The problem
The Unreal Engine 4 Editor allows you to add objects of your own types to the scene.
Doing so requires minimal work from the user - to make a class visible in the editor you only need to add some macros, like UCLASS()
UCLASS()
class MyInputComponent: public UInputComponent //you can instantiate it in the editor!
{
UPROPERTY(EditAnywhere)
bool IsSomethingEnabled;
};
This is enough to allow the editor to serialize the created-in-editor object's data (remember: the class is user-defined but the user doesn't have to hardcode loading specific fields. Also note that the UPROPERTY variable can be of user-defined type as well). It is then deserialized while loading the actual game. So how is it handled so painlessly?
My attempt - hardcoded loading for every new class
class Component //abstract class
{
public:
virtual void LoadFromStream(std::stringstream& str) = 0;
//virtual void SaveIntoStream(std::stringstream& str) = 0;
};
class UserCreatedComponent: public Component
{
std::string Name;
int SomeInteger;
vec3 SomeVector; //example of user-defined type
public:
virtual void LoadFromStream(std::stringstream& str) override //you have to write a function like this every time you create a new class
{
str >> Name >> SomeInteger >> SomeVector.x >> SomeVector.y >> SomeVector.z;
}
};
std::vector<Component*> ComponentsFromStream(std::stringstream& str)
{
std::vector<Component*> components;
std::string type;
while (str >> type)
{
if (type == "UserCreatedComponent") //do this for every user-defined type...
components.push_back(new UserCreatedComponent);
else
continue;
components.back()->LoadFromStream(str);
}
return components;
}
Example of an UserCreatedComponent object stream representation:
UserCreatedComponent MyComponent 5 0.707 0.707 0.707
The engine user has to do these things every time he creates a new class:
1. Modify ComponentsFromStream by adding another if
2. Add two methods, one which loads from stream and another which saves to stream.
We want to simplify it so the user only has to use a macro like UPROPERTY.
Our goal is to free the user from all this work and create a more extensible solution, like UE4's (described above).
Attempt at simplifying 1: Using type-int mapping
This section is based on the following: https://stackoverflow.com/a/17409442/12703830
The idea is that for every new class we map an integer, so when we create an object we can just pass the integer given in the stream to the factory.
Example of an UserCreatedComponent object stream representation:
1 MyComponent 5 0.707 0.707 0.707
This solves the problem of working out the type of created object but also seems to create two new problems:
How should we map classes to integers? What would happen if we include two libraries containing classes that map themselves to the same number?
What will initializing e.g. components that need vectors for construction look like? We don't always use strings and ints for object construction (and streams give us pretty much only that).
So how is it handled so painlessly?
C++ language does not provide features which would allow to implement such simple de/serialization of class instances as it works in the Unreal Engine. There are various ways how to workaround the language limitations, the Unreal uses a code generator.
The general idea is following:
When you start project compilation, a code generator is executed.
The code generator parses your header files and searches for macros which has special meaning, like UCLASS, USTRUCT, UENUM, UPROPERTY, etc.
Based on collected data, it generates not only code for de/serialization, but also for other purposes, like reflection (ability to iterate certain members), information about inheritance, etc.
After that, your code is finally compiled along with the generated code.
Note: this is also why you have to include "MyClass.generated.h" in all header files which declare UCLASS, USTRUCT and similar.
In other words, someone must write the de/serialization code in some form. The Unreal solution is that the author of such code is an application.
If you want to implement such system yourself, be aware that it's lots of work. I'm no expert in this field, so I'll just provide general information:
The primary idea of code-generators is to automatize repetitive work, nothing more - in other words, there's no other special magic. That means that "how objects are de/serialized" (how they're transformed from memory to file) and "how the code which de/serializes is created" (whether it's written by a person or generated by an application) are two separate topics.
First, it should be established how objects are de/serialized. For example, std::stringstream can be used, or objects can be de/serialized from/to generally known formats like XML, json, bson, yaml, etc., or a custom solution can be defined.
Establish what's the source of data for generated de/serialization code. In case of Unreal Engine, it's user code itself. But it's not the only way - for example Protobuffers use a simple language which is used only to define data structure and the generator creates code which you can include and use.
If the source of data should be C++ code itself, do not write you own C++ parser! (The only exceptions to this rule are: educational purpose or if you want to spend rest of your life with working on the parser.) Luckily, there are projects which you can use - for example there's clang AST.
How should we map classes to integers? What would happen if we include two libraries containing classes that map themselves to the same number?
There's one fundamental problem with mapping classes to integers: it's not possible to uniquely map every possible class name to an integer.
Proof: create classes named Foo_[integer] and map it to the [integer], i.e. Foo_0 -> 0, Foo_1 -> 1, Foo_2 -> 2, etc. After you use biggest integer value, how do you map Bar_0?
You can start assigning the numbers sequentially as they're added to a project, but as you correctly pin-pointed, what if you include new library? You could start counting from some big number, like 1.000.000, but how do you determine what should be first number for each library? It doesn't have a clear solution.
Some of solutions to this problem are:
Define clear subset of classes which can be de/serialized and assign sequential integers to these classes. The subset can be, for example, "only classes in my project, no library support".
Identify classes with two integers - one for class, one for library. This means you have to have some central register which assigns library integers uniquely (e.g. in order they're registered).
Use string which uniquely identifies the class including library name. This is what Unreal uses.
Generate a hash from class and library name. There's risk of hash collision - the better hash you use, the lower risk there is. For example git (the version control application) uses SHA-1 (which is considered unsafe today) to identify it's objects (files, directories, commits) and the program is used worldwide without bigger issues.
Generate UUID, a 128-bit random number (with special rules). There's also risk of collision, but it's generally considered highly improbable. Used by Java and Unity the game engine.
What would happen if we include two libraries containing classes that map themselves to the same number?
That's called a collision. How it's handled depends on design of de/serialization code, there are mainly two approaches to this problem:
Detect that. For example if your class identifier contains library identifier, don't allow loading/registering library with ID which is already identified. In case of ID which doesn't include library ID (e.g. hash/UUID variant), don't allow registering such classes. Throw an exception or exit the application.
Assume there's no collision. If actual collision happens, it's so-called UB, an undefined behaviour. The application will probably crash or act weirdly. It might corrupt stored data.
What will initializing e.g. components that need vectors for construction look like? We don't always use strings and ints for object construction (and streams give us pretty much only that).
This depends on what it's required from de/serializing code.
The simplest solution is actually to use string of values separated by space.
For example, let's define following structure:
struct Person
{
std::string Name;
float Age;
};
A vector of Person instances could look like: 3 Adam 22.2 Bob 34.5 Cecil 19.0 (i.e. first serialize number of items (vector size), then individual items).
However, what if you add, remove or rename a member? The serialized data would become unreadable. If you want more robust solution, it might be better to use more structured data, for example YAML:
persons:
- name: Adam
age: 22.2
- name: Bob
age: 34.5
- name: Cecil
age: 19.0
Final notes
The problem of de/serializing objects (in C++) is actually big, various systems uses various solutions. That's why this answer is so generic and it doesn't provide exact code - there's not single silver bullet. Every solution has it's advantages and disadvantages. Even detailed description of just Unreal Engine's serialization system would become a book.
So this answer assumes that reader is able to search for various mentioned topic, like yaml file format, Protobuffers, UUID, etc.
Every mentioned solution to a sub-problem has lots of it's own problems which weren't explored. For example de/serialization of string with spaces or new lines from/to simple string stream. If it's needed to solve such problems, it's recommended to first search for more specialized questions or write one if there's nothing to be found.
Also, C++ is constantly evolving. For example, better support for reflection is added, which might, one day, provide enough features to implement high-quality de/serializer. However, if it should be done in compile-time, it would heavily depend on templates which slow down compilation process significantly and decrease code readibility. That's why code generators might be still considered a better choice.
My goal here is to create a unique ID (starting a 0) for each child of a specific class. I'm not sure if it is possible in the way i want, but i figured i'd ask here as a last resort.
Some context:
I'm creating my own 2D game engine and i want it to have an ECS as it's back bone (Before anyone says anything, i'm doing this as a learning experience, i know i could just use an already existing game engine). My idea is that each class that implements the 'EntityComponent' class should have a unique ID applied to it. This needs to be per child, not per object. I want to use this ID as the index for an array to find the component of an entity. The actual ID that each Component gets is unimportant and each component does not need to be assigned the ID every run time.
My hope is there is some way to create something similar to a static variable per class (That implements the Entity Component class). It needs to be quick to get this value so doing an unordered_map lookup is slower than i would like. One thing i do not want to do is setting the ID for every component myself. This could cause problems once many components are made and could cause problems if i forget to set it or set two components to the same ID.
One idea i had was to make a variable in EntityComponent called ID (And a getter to get it). When the entity is constructed it looks up an unordered map (which was made at run time, assigning an ID to each class) for what ID it should have. The price of looking up once at construction is fine. The only problem i see with this is there is a lot of redundant data (Though overall it seems it would account to a pretty small amount). With this, every single transform component would have to store that it its ID is x. This means potentially thousands upon thousands of transform components are storing this ID value, when only 1 really needs to.
Basically i am after an extremely quick way to find an ID for a class TYPE. This can be through a lookup, but it needs to be a quick lookup. I would like something faster than unordered_map if possible. If this can be done through compile time tricks (Maybe enums?) or maybe even templates i would love to hear your ideas. I know premature optimisation is the bad, but being able to get a component fast is a pretty big thing.
What i'm asking might very well be impossible. Just thought i'd ask here to make sure first. I should also note i'm trying to avoid implementation of this in the children classes. I'd like to not have to set up the same code for each child class to create an id.
Thank you.
In order to get something corresponding to the actual type of an object, it either needs to be in the object itself or accessed via a virtual function. Otherwise the type will be determined by the type of the variable it is associated with.
A common option when speed and size are both important is to have an integer identifier associated with each type (when the full type list is known at compile time) and use that integer value in a specific way when you want to do something based on the type.
The integer mechanism usually uses an enum for generating the corresponding value for each type and has that field in every object.
The virtual method variety, I've used boost::uuid and a static data member in each class and a virtual method get'er for it.
Declare a virtual function newId() in EntityComponent.
Implement this function to get and increment a static variable in each class, which children you want to have a unique Id.
Assign Id in the constructor:
mId = newId();
don't know this if this is what you meant and i know this is an old post however this is how im currently dealing with a similar issue, maybe it will help someone else.
(Im also doing this as a learning experience for uni :) )
in the controlling class or its own utility class:
enum class EntityType{ TYPE_ONE = 0, TYPE_TWO =1};
in class header:
#include "EntityType.h"
class Whatever{
public:
inline void getType(){return _type;}
OR
inline void getType(){return EntityType::TYPE_ONE;}
private:
EntityType _type = EntityType::TYPE_ONE;
};
Hope this is helpful to anyone :)
I have a class that contains some members that may be modified from external sources:
class CNode
{
protected:
// these members might be changed by users
std::string m_sName;
CState m_CurrentState;
CColor m_DiffuseColor;
};
The sample is simplified, it has more members in my code.
Now what would be the best practice to change
single
multiple (at once)
all members (at once)
of this class?
My code needs to handle all cases, though internally case 1. will be the usual case.
Most of the time it is applied to a single CNode but it might also be applied to an array of nodes.
I know two possible solutions that both don't really satisfy me:
set&get for every variable:
Pro:
Every variable is modifiable independently from other members.
Easy to set the same value for multiple CNodes.
Contra:
A lot of set&gets;
If a new variable is added, a new set&get needs to be added as well.
This would work best if one Variable is changed in many CNodes
Create a CProperties class that contains all modifiable variables:
Pro:
One set&get in the parent class - properties may be added/removed without needing to modify set&get.
This also reduces the amount of methods in my API that processes user input.
Contra:
setting individual variables requires getting the current CProperties, so the other values won't be modified.
This would work best if multiple/all variables are updated at once in a single CNode.
Like so:
class CProperties
{
// these members might be changed by users
std::string m_sName;
CState m_CurrentState;
CColor m_DiffuseColor;
}
class CNode
{
public:
const CProperties* GetProperties();
void SetProperties(CProperties*);
protected:
CProperties m_Properties;
}
This would be the most lazy version (by code creation effort) but also the most obnoxious version for me, since setting single variables requires getting the current properties first, modifying a single variable and then setting the complete properties class back to the node.
Especially in the modify-a-single-variable-in-multiple-CNodes-case this seems to be an awful solution.
Time of execution and (size) overhead is mostly irrelevant in my case. (Which would really be my only good argument against the lazy version)
Instead I'm looking for clean, understandable, usable code.
I could think of a third possible solution with a dynamic approach:
One set method has an object as parameter that may contain one or more values that need to be modified. The API then only has one set method that won't need any modification if the CProperties change. Instead a parsing method would be needed in the CNode class.
This parsing method would still need to be updated for every change in the CProperties, though I'm confident this should also be solvable with the compiler through templates.
My question is:
Are there any other possible solutions for my use case?
Which approach is the most sensible one?
You can add the logic of how to update and what to update into the class and only provide it with a source of information.
Look at how you build the properties object and transfer the logic into an update function that will reside in CNode.
If CNode needs external sources of information in order to perform the update then transfer them to the update function.
Preferably the number of arguments passed to the update function will be smaller than the number of fields in CNode.(Ideally zero or one)
CNode will only update the fields that have actually been modified.
No need for a multiple set functions, uniform way of updating class, no information getting lost between the cracks.
I have a library which can save/load on disk "chunks" which are POD structs with constant size and unique static CHUNK_ID field. So load looks somethink like this.
void Load(int docId, char* ptr, int type, size_t& size)...
If you want to add new chunk you just add struct with new CHUNK_ID and use Save Load functions to it.
What I want is to force all "chunks" to have functions like PrintHumanReadable, CompareThisTypeOfChunk etc(Ideally program should not compile without such functions). Also I want to mark/register/enumerate all chunk-structs.
I have a few ideas but all of them have problems.
Create base class with pure virtual functions PrintHumanReadable, CompareThisTypeOfChunk.
Problem:breaks pod type and requires library rewriting.
Implement factory which creates chunk struct from CHUNK_ID. Problem: compiles when I add new chunk without required functions.
Could you recomend elegant design solution for my problem?
Implement a simple code generator. You can use something like Mako or Cheetah (both Python libraries). Make a text file containing all the class names, then have the generator build the factory method and a series of methods which aren't really used but which refer to the desired methods in all the classes. This will also make it straightforward to enumerate the classes (again, using generated code).
The proper design pattern for this is called "use Boost.Serialization". It's really the best tool for writing objects to a format and then reading them back later. It can write in text, binary, and even XML formats (and others if you write a proper stream for them). It's can be non-intrusive, so you don't need to modify the objects to serialize them. And so forth.
Once you're using the proper tool for this job, you can then use whatever class hierarchy or other method you like to ensure that the proper functions for an object exist.
If you can't/won't use Boost.Serialization, then you're pretty much stuck with a runtime solution. And since the solution is runtime rather than compile time, there's no way to ensure at compile time that any particular chunk ID has the requisite functions.
I am developing a C++ application used to simulate a real world scenario. Based on this simulation our team is going to develop, test and evaluate different algorithms working within such a real world scenrio.
We need the possibility to define several scenarios (they might differ in a few parameters, but a future scenario might also require creating objects of new classes) and the possibility to maintain a set of algorithms (which is, again, a set of parameters but also the definition which classes are to be created). Parameters are passed to the classes in the constructor.
I am wondering which is the best way to manage all the scenario and algorithm configurations. It should be easily possible to have one developer work on one scenario with "his" algorithm and another developer working on another scenario with "his" different algorithm. Still, the parameter sets might be huge and should be "sharable" (if I defined a set of parameters for a certain algorithm in Scenario A, it should be possible to use the algorithm in Scenario B without copy&paste).
It seems like there are two main ways to accomplish my task:
Define a configuration file format that can handle my requirements. This format might be XML based or custom. As there is no C#-like reflection in C++, it seems like I have to update the config-file parser each time a new algorithm class is added to project (in order to convert a string like "MyClass" into a new instance of MyClass). I could create a name for every setup and pass this name as command line argument.
The pros are: no compilation required to change a parameter and re-run, I can easily store the whole config file with the simulation results
contra: seems like a lot of effort, especially hard because I am using a lot of template classes that have to be instantiated with given template arguments. No IDE support for writing the file (at least without creating a whole XSD which I would have to update everytime a parameter/class is added)
Wire everything up in C++ code. I am not completely sure how I would do this to separate all the different creation logic but still be able to reuse parameters across scenarios. I think I'd also try to give every setup a (string) name and use this name to select the setup via command line arg.
pro: type safety, IDE support, no parser needed
con: how can I easily store the setup with the results (maybe some serialization?)?, needs compilation after every parameter change
Now here are my questions:
- What is your opinion? Did I miss
important pros/cons?
- did I miss a third option?
- Is there a simple way to implement the config file approach that gives
me enough flexibility?
- How would you organize all the factory code in the seconde approach? Are there any good C++ examples for something like this out there?
Thanks a lot!
There is a way to do this without templates or reflection.
First, you make sure that all the classes you want to create from the configuration file have a common base class. Let's call this MyBaseClass and assume that MyClass1, MyClass2 and MyClass3 all inherit from it.
Second, you implement a factory function for each of MyClass1, MyClass2 and MyClass3. The signatures of all these factory functions must be identical. An example factory function is as follows.
MyBaseClass * create_MyClass1(Configuration & cfg)
{
// Retrieve config variables and pass as parameters
// to the constructor
int age = cfg->lookupInt("age");
std::string address = cfg->lookupString("address");
return new MyClass1(age, address);
}
Third, you register all the factory functions in a map.
typedef MyBaseClass* (*FactoryFunc)(Configuration *);
std::map<std::string, FactoryFunc> nameToFactoryFunc;
nameToFactoryFunc["MyClass1"] = &create_MyClass1;
nameToFactoryFunc["MyClass2"] = &create_MyClass2;
nameToFactoryFunc["MyClass3"] = &create_MyClass3;
Finally, you parse the configuration file and iterate over it to find all the entries that specify the name of a class. When you find such an entry, you look up its factory function in the nameToFactoryFunc table and invoke the function to create the corresponding object.
If you don't use XML, it's possible that boost::spirit could short-circuit at least some of the problems you are facing. Here's a simple example of how config data could be parsed directly into a class instance.
I found this website with a nice template supporting factory which I think will be used in my code.