I have the following function:
void scan(DataRow& input) {
if(input.isRaw()) {
...
}
if(input.isExternal()) {
...
}
if(input.hasMultipleFields()) {
...
for(auto& field: input.fields()) {
if(field.size() == 2) {
...
}
}
}
}
The DataRow class has many sub-classes and all the is functions above are virtual.
This function is used to scan several large groups of data rows. For each group, all data row instances will have the same property (e.g., all raw, all external).
So instead of having all these if/else logics in the scan function, I am thinking if there is a way to generate ad-hoc code. For example, now I already know my next group are all raw (or all not), then I can get rid of the first if branch.
In Java, I used to do such kind of things by generating byte code for class and dynamically load the generated class in JVM. I know the same trick does not work for C++ but I have little experience how to do this. Can anyone give some hint? Thanks!
You cannot easily manipulate executable code during runtime. But your question doesn’t look like you’d have to go down that road anway.
You have groups of rows with similar properties and special processing logic for each group. Also, there seems to be a small fixed number of different kinds of groups.
You have all necessary information to split up your code at compile time – “programming time” actually. Split the scan() function into one function for each kind of group and call scan_raw(), scan_external(), etc. accordingly.
This reduces the number of if condition checks from once per row to once per group. As an added benefit the separate scan functions can use the appropriate derived class as their parameter type and you can get rid of the whole isSomething() machinery.
Hm, at this point I’m tempted to point you towards std::variant and std::visit (or their Boost equivalents). That could be a larger refactoring, though. Because when using them you’d ideally use them as a complete replacement for your current inheritance based polymorphism approach.
Related
The problem
The Unreal Engine 4 Editor allows you to add objects of your own types to the scene.
Doing so requires minimal work from the user - to make a class visible in the editor you only need to add some macros, like UCLASS()
UCLASS()
class MyInputComponent: public UInputComponent //you can instantiate it in the editor!
{
UPROPERTY(EditAnywhere)
bool IsSomethingEnabled;
};
This is enough to allow the editor to serialize the created-in-editor object's data (remember: the class is user-defined but the user doesn't have to hardcode loading specific fields. Also note that the UPROPERTY variable can be of user-defined type as well). It is then deserialized while loading the actual game. So how is it handled so painlessly?
My attempt - hardcoded loading for every new class
class Component //abstract class
{
public:
virtual void LoadFromStream(std::stringstream& str) = 0;
//virtual void SaveIntoStream(std::stringstream& str) = 0;
};
class UserCreatedComponent: public Component
{
std::string Name;
int SomeInteger;
vec3 SomeVector; //example of user-defined type
public:
virtual void LoadFromStream(std::stringstream& str) override //you have to write a function like this every time you create a new class
{
str >> Name >> SomeInteger >> SomeVector.x >> SomeVector.y >> SomeVector.z;
}
};
std::vector<Component*> ComponentsFromStream(std::stringstream& str)
{
std::vector<Component*> components;
std::string type;
while (str >> type)
{
if (type == "UserCreatedComponent") //do this for every user-defined type...
components.push_back(new UserCreatedComponent);
else
continue;
components.back()->LoadFromStream(str);
}
return components;
}
Example of an UserCreatedComponent object stream representation:
UserCreatedComponent MyComponent 5 0.707 0.707 0.707
The engine user has to do these things every time he creates a new class:
1. Modify ComponentsFromStream by adding another if
2. Add two methods, one which loads from stream and another which saves to stream.
We want to simplify it so the user only has to use a macro like UPROPERTY.
Our goal is to free the user from all this work and create a more extensible solution, like UE4's (described above).
Attempt at simplifying 1: Using type-int mapping
This section is based on the following: https://stackoverflow.com/a/17409442/12703830
The idea is that for every new class we map an integer, so when we create an object we can just pass the integer given in the stream to the factory.
Example of an UserCreatedComponent object stream representation:
1 MyComponent 5 0.707 0.707 0.707
This solves the problem of working out the type of created object but also seems to create two new problems:
How should we map classes to integers? What would happen if we include two libraries containing classes that map themselves to the same number?
What will initializing e.g. components that need vectors for construction look like? We don't always use strings and ints for object construction (and streams give us pretty much only that).
So how is it handled so painlessly?
C++ language does not provide features which would allow to implement such simple de/serialization of class instances as it works in the Unreal Engine. There are various ways how to workaround the language limitations, the Unreal uses a code generator.
The general idea is following:
When you start project compilation, a code generator is executed.
The code generator parses your header files and searches for macros which has special meaning, like UCLASS, USTRUCT, UENUM, UPROPERTY, etc.
Based on collected data, it generates not only code for de/serialization, but also for other purposes, like reflection (ability to iterate certain members), information about inheritance, etc.
After that, your code is finally compiled along with the generated code.
Note: this is also why you have to include "MyClass.generated.h" in all header files which declare UCLASS, USTRUCT and similar.
In other words, someone must write the de/serialization code in some form. The Unreal solution is that the author of such code is an application.
If you want to implement such system yourself, be aware that it's lots of work. I'm no expert in this field, so I'll just provide general information:
The primary idea of code-generators is to automatize repetitive work, nothing more - in other words, there's no other special magic. That means that "how objects are de/serialized" (how they're transformed from memory to file) and "how the code which de/serializes is created" (whether it's written by a person or generated by an application) are two separate topics.
First, it should be established how objects are de/serialized. For example, std::stringstream can be used, or objects can be de/serialized from/to generally known formats like XML, json, bson, yaml, etc., or a custom solution can be defined.
Establish what's the source of data for generated de/serialization code. In case of Unreal Engine, it's user code itself. But it's not the only way - for example Protobuffers use a simple language which is used only to define data structure and the generator creates code which you can include and use.
If the source of data should be C++ code itself, do not write you own C++ parser! (The only exceptions to this rule are: educational purpose or if you want to spend rest of your life with working on the parser.) Luckily, there are projects which you can use - for example there's clang AST.
How should we map classes to integers? What would happen if we include two libraries containing classes that map themselves to the same number?
There's one fundamental problem with mapping classes to integers: it's not possible to uniquely map every possible class name to an integer.
Proof: create classes named Foo_[integer] and map it to the [integer], i.e. Foo_0 -> 0, Foo_1 -> 1, Foo_2 -> 2, etc. After you use biggest integer value, how do you map Bar_0?
You can start assigning the numbers sequentially as they're added to a project, but as you correctly pin-pointed, what if you include new library? You could start counting from some big number, like 1.000.000, but how do you determine what should be first number for each library? It doesn't have a clear solution.
Some of solutions to this problem are:
Define clear subset of classes which can be de/serialized and assign sequential integers to these classes. The subset can be, for example, "only classes in my project, no library support".
Identify classes with two integers - one for class, one for library. This means you have to have some central register which assigns library integers uniquely (e.g. in order they're registered).
Use string which uniquely identifies the class including library name. This is what Unreal uses.
Generate a hash from class and library name. There's risk of hash collision - the better hash you use, the lower risk there is. For example git (the version control application) uses SHA-1 (which is considered unsafe today) to identify it's objects (files, directories, commits) and the program is used worldwide without bigger issues.
Generate UUID, a 128-bit random number (with special rules). There's also risk of collision, but it's generally considered highly improbable. Used by Java and Unity the game engine.
What would happen if we include two libraries containing classes that map themselves to the same number?
That's called a collision. How it's handled depends on design of de/serialization code, there are mainly two approaches to this problem:
Detect that. For example if your class identifier contains library identifier, don't allow loading/registering library with ID which is already identified. In case of ID which doesn't include library ID (e.g. hash/UUID variant), don't allow registering such classes. Throw an exception or exit the application.
Assume there's no collision. If actual collision happens, it's so-called UB, an undefined behaviour. The application will probably crash or act weirdly. It might corrupt stored data.
What will initializing e.g. components that need vectors for construction look like? We don't always use strings and ints for object construction (and streams give us pretty much only that).
This depends on what it's required from de/serializing code.
The simplest solution is actually to use string of values separated by space.
For example, let's define following structure:
struct Person
{
std::string Name;
float Age;
};
A vector of Person instances could look like: 3 Adam 22.2 Bob 34.5 Cecil 19.0 (i.e. first serialize number of items (vector size), then individual items).
However, what if you add, remove or rename a member? The serialized data would become unreadable. If you want more robust solution, it might be better to use more structured data, for example YAML:
persons:
- name: Adam
age: 22.2
- name: Bob
age: 34.5
- name: Cecil
age: 19.0
Final notes
The problem of de/serializing objects (in C++) is actually big, various systems uses various solutions. That's why this answer is so generic and it doesn't provide exact code - there's not single silver bullet. Every solution has it's advantages and disadvantages. Even detailed description of just Unreal Engine's serialization system would become a book.
So this answer assumes that reader is able to search for various mentioned topic, like yaml file format, Protobuffers, UUID, etc.
Every mentioned solution to a sub-problem has lots of it's own problems which weren't explored. For example de/serialization of string with spaces or new lines from/to simple string stream. If it's needed to solve such problems, it's recommended to first search for more specialized questions or write one if there's nothing to be found.
Also, C++ is constantly evolving. For example, better support for reflection is added, which might, one day, provide enough features to implement high-quality de/serializer. However, if it should be done in compile-time, it would heavily depend on templates which slow down compilation process significantly and decrease code readibility. That's why code generators might be still considered a better choice.
I am trying to make an architecture for a MMO game and I can't figure out how I can store as many variables as I need in GameObjects without having a lot of calls to send them on a wire at the same time I update them.
What I have now is:
Game::ChangePosition(Vector3 newPos) {
gameobject.ChangePosition(newPos);
SendOnWireNEWPOSITION(gameobject.id, newPos);
}
It makes the code rubbish, hard to maintain, understand, extend. So think of a Champion example:
I would have to make a lot of functions for each variable. And this is just the generalisation for this Champion, I might have have 1-2 other member variable for each Champion type/"class".
It would be perfect if I would be able to have OnPropertyChange from .NET or something similar. The architecture I am trying to guess would work nicely is if I had something similar to:
For HP: when I update it, automatically call SendFloatOnWire("HP", hp);
For Position: when I update it, automatically call SendVector3OnWire("Position", Position)
For Name: when I update it, automatically call SendSOnWire("Name", Name);
What are exactly SendFloatOnWire, SendVector3OnWire, SendSOnWire ? Functions that serialize those types in a char buffer.
OR METHOD 2 (Preffered), but might be expensive
Update Hp, Position normally and then every Network Thread tick scan all GameObject instances on the server for the changed variables and send those.
How would that be implemented on a high scale game server and what are my options? Any useful book for such cases?
Would macros turn out to be useful? I think I was explosed to some source code of something similar and I think it used macros.
Thank you in advance.
EDIT: I think I've found a solution, but I don't know how robust it actually is. I am going to have a go at it and see where I stand afterwards. https://developer.valvesoftware.com/wiki/Networking_Entities
On method 1:
Such an approach could be relatively "easy" to implement using a maps, that are accessed via getters/setters. The general idea would be something like:
class GameCharacter {
map<string, int> myints;
// same for doubles, floats, strings
public:
GameCharacter() {
myints["HP"]=100;
myints["FP"]=50;
}
int getInt(string fld) { return myints[fld]; };
void setInt(string fld, int val) { myints[fld]=val; sendIntOnWire(fld,val); }
};
Online demo
If you prefer to keep the properties in your class, you'd go for a map to pointers or member pointers instead of values. At construction you'd then initialize the map with the relevant pointers. If you decide to change the member variable you should however always go via the setter.
You could even go further and abstract your Champion by making it just a collection of properties and behaviors, that would be accessed via the map. This component architecture is exposed by Mike McShaffry in Game Coding Complete (a must read book for any game developer). There's a community site for the book with some source code to download. You may have a look at the actor.h and actor.cpp file. Nevertheless, I really recommend to read the full explanations in the book.
The advantage of componentization is that you could embed your network forwarding logic in the base class of all properties: this could simplify your code by an order of magnitude.
On method 2:
I think the base idea is perfectly suitable, except that a complete analysis (or worse, transmission) of all objects would be an overkill.
A nice alternative would be have a marker that is set when a change is done and is reset when the change is transmitted. If you transmit marked objects (and perhaps only marked properties of those), you would minimize workload of your synchronization thread, and reduce network overhead by pooling transmission of several changes affecting the same object.
Overall conclusion I arrived at: Having another call after I update the position, is not that bad. It is a line of code longer, but it is better for different motives:
It is explicit. You know exactly what's happening.
You don't slow down the code by making all kinds of hacks to get it working.
You don't use extra memory.
Methods I've tried:
Having maps for each type, as suggest by #Christophe. The major drawback of it was that it wasn't error prone. You could've had HP and Hp declared in the same map and it could've added another layer of problems and frustrations, such as declaring maps for each type and then preceding every variable with the mapname.
Using something SIMILAR to valve's engine: It created a separate class for each networking variable you wanted. Then, it used a template to wrap up the basic types you declared (int, float, bool) and also extended operators for that template. It used way too much memory and extra calls for basic functionality.
Using a data mapper that added pointers for each variable in the constructor, and then sent them with an offset. I left the project prematurely when I realised the code started to be confusing and hard to maintain.
Using a struct that is sent every time something changes, manually. This is easily done by using protobuf. Extending structs is also easy.
Every tick, generate a new struct with the data for the classes and send it. This keeps very important stuff always up to date, but eats a lot of bandwidth.
Use reflection with help from boost. It wasn't a great solution.
After all, I went with using a mix of 4, and 5. And now I am implementing it in my game. One huge advantage of protobuf is the capability of generating structs from a .proto file, while also offering serialisation for the struct for you. It is blazingly fast.
For those special named variables that appear in subclasses, I have another struct made. Alternatively, with help from protobuf I could have an array of properties that are as simple as: ENUM_KEY_BYTE VALUE. Where ENUM_KEY_BYTE is just a byte that references a enum to properties such as IS_FLYING, IS_UP, IS_POISONED, and VALUE is a string.
The most important thing I've learned from this is to have as much serialization as possible. It is better to use more CPU on both ends than to have more Input&Output.
If anyone has any questions, comment and I will do my best helping you out.
ioanb7
I am kind of a newbie and I am creating a framework to evolve objects in C++ with an evolutionary algorithm.
An evolutionary algorithm evolves objects and tests them to get the best solution (for example, evolve the weights neural network and test it on sample data, so that in the end you get a network which has a good accuracy, without having trained it).
My problem is that there are lots of parameters for the algorithm (type of selection/crossover/mutation, probabilities for each of them...) and since it is a framework, the user should be able to easily access and modify them.
CURRENT SOLUTION
For now, I created a header file parameters.h of this form:
// DON'T CHANGE THESE PARAMETERS
//mutation type
#define FLIP 1
#define ADD_CONNECTION 2
#define RM_CONNECTION 3
// USER DEFINED
static const int TYPE_OF_MUTATION = FLIP;
The user modifies the static variables TYPE_OF_MUTATION and then my mutation function tests what the value of TYPE_OF_MUTATION is and calls the right mutation function.
This works well, but it has a few drawbacks:
when I change a parameter in this header and then call "make", no change is taken into account, I have to call "make clean" then "make". From what I saw, it is not a problem in the makefile but it is how building works. Even if it did re-build when I change a parameter, it would mean re-compile the whole project as these parameters are used everywhere; it is definitely not efficient.
if you want to run the genetic algorithm several times with different parameters, you have to run it a first time then save the results, change the parameters then run it a second time etc.
OTHER POSSIBILITIES
I thought about taking these parameters as arguments of the top-level function. The problem is that the function would then take 20 arguments or so, it doesn't seem really readable...
What I mean about the top-level function is that for now, the evolutionary algorithm is run simply by doing this:
PopulationManager myPop;
myPop.evolveIt();
If I defined the parameters as arguments, we would have something like:
PopulationManager myPop;
myPop.evolveIt(20,10,5,FLIP,9,8,2,3,TOURNAMENT,0,23,4);
You can see how hellish it may be to always define parameters in the right order !
CONCLUSION
The frameworks I know make you build your algorithm yourself from pre-defined functions, but the user shouldn't have to go through all the code to change parameters one by one.
It may be useful to indicate that this framework will be used internally, for a definite set of projects.
Any input about the best way to define these parameters is welcome !
If the options do not change I usually use a struct for this:
enum class MutationType {
Flip,
AddConnection,
RemoveConnection
};
struct Options {
// Documentation for mutation_type.
MutationType mutation_type = MutationType::Flip;
// Documentation for integer option.
int integer_option = 10;
};
And then provide a constructor that takes these options.
Options options;
options.mutation_type = MutationType::AddConnection;
PopulationManager population(options);
C++11 makes this really easy, because it allows specifying defaults for the options, so a user only needs to set the options that need to be different from the default.
Also note that I used an enum for the options, this ensures that the user can only use correct values.
This is a classic example of polymorphism. In your proposed implementation you're doing a switch on constant to decide which polymorphic mutation algorithm you will choose to decide how to mutate the parameter. In C++, the corresponding mechanisms are templates (static polymorphism) or virtual functions (dynamic polymorphism) to select the appropriate mutating algorithm to apply to the parameter.
The templates way has the advantage that everything is resolvable at compile time and the resulting mutating algorithm could be inlined entirely, depending on the implementation. What you give up is the ability to dynamically select parameter mutation algorithms at runtime.
The virtual function way has the advantage that you can defer the choice of mutation algorithm until runtime, allowing this to vary based on input from the user or whatnot. The disadvantage is that the mutation algorithm can no longer be inlined and you pay the cost of a virtual function call (an extra level of indirection) when you mutate the parameter.
If you want to see a real example of how "algorithmic mutation" can work, look at evolve.cpp in my Iterated Dynamics repository on github. This is C code converted to C++ so it is neither using templates nor using virtual functions. Instead it uses function pointers and a switch-on-constant to select the appropriate code. However, the idea is the same.
My recommendation would be to see if you can use static polymorphism (templates) first. From your initial description you were fixing the mutation at compile-time anyway, so you're not giving anything up.
If that was just a prototyping phase and you intended to support switching of mutation algorithms at runtime, then look at virtual functions. As the other answer recommended, please shun C-style coding like #define constants and instead use proper enums.
To solve the "long parameter list smell", the idea of packing all the parameters into a structure is a good one. You can achieve more readability on top of that by using the builder pattern to build up the structure of parameters in a more readable way than just assigning a bunch of values into a struct. In this blog post, I applied the builder pattern to the resource description structures in Direct3D. That allowed me to more directly express these "bags of data" with reasonable defaults and directly reveal my intent to override or replace default values with special values when necessary.
So far I have been using dynamic casting. But this comes with it's pros and cons. It seems that is a good thing NOT to use this too much. The examples on this topic, that I have found, are usually with classes that have little differences. But in my case, the "child" classes have very little similarities.
The code in this post is NOT from the project. It's only used for examples.
I am making a trading system for a game and there will be many more systems in the project. There are many different items that do many different things- equipment, modifications, resources. No matter how different they are, they all have a price and they can all be putted in an inventory, no matter what they are. But this is where are the similarities end, including the overridden methods.
Afterwards the different items are used in completely different ways. At first the different types of items were sorted in separate arrays of pointers from different types- one for the equipment, one for the modifications, e.t.c. To put something in an inventory I only use a single method- addToInventory(Item* item) . Since the item must be placed in the right array, I use dynamic casting- I convert Item* item to (for example) Equipment* equi, so I can add it to the Equipment array. I want to do it in the same method, because it's more intuitive and otherwise the different methods would have similar code.
addToInventory(Item* item)
{
if (item->type == 'e')
{
Equipment* newEquip = dynamic_cast<Equipment*>(item);
equipmentArr.add(newEquip);//thous arrays are dynamic- the reason I needed to make the conversion explained later
}
else if (item->type == 'm')
{
Modification* newMod = dynamic_cast<Modification*>(item);
modificationArr.add(newEquip);
}
//and so on...
}
Later I would want to add a modification to a piece of equipment- Weapon::addMod(Modification* mod) . And in this method I use other methods and variables that are found ONLY in the Weapon class.
addMod (Modification* mod)
{//all are found ONLY in class Weapon
mod[modCount] = mod; //an array of Modification* pointers
modCount++;
calcEfficiency();
}
But when I want to make the simple thing to print an inventory, I either have to copy-paste and edit some code for converting the pointers in the arrays, so I can pass them in the same printing method, or copy-paste and edit the same code for printing. There is a third option- to make the arrays to all arrays of pointers to Item objects. I tried the last option.
It got rid of the casting in addToInventory(Item* item), yay! But it caused the need to use casting EVERY time I need to call methods such as Weapon::addMod(Modification* mod) and in other places. Otherwise, I will need to put the casting within the method, but I want the method to explicitly take an Equipment* argument.
The project is still really early in development, so I don't know how much more I might need to use casting, so I can switch back and forth between different types of pointers when needed.
So, in a similar case, how should I switch between different types of pointers?
You may want to represent the traits (namely Equipment and Modification) of your (broad) Item implementations as pure virtual classes (i.e. interfaces). This way dynamic casting and dynamic cast checks for these interfaces is OK and will lower the noise to handle for actual implementations of Equipment and Modification.
Another way is to use the CRTP pattern and static_cast<Interface*> to have compile time checks for your interfaces.
Depends on your use case which way is more appropriate. As a rule of thumb:
Mostly static configuration => Do at compile time
More dynamic configuration (run time allocated instances) => Do at runtime
I am developing a C++ application used to simulate a real world scenario. Based on this simulation our team is going to develop, test and evaluate different algorithms working within such a real world scenrio.
We need the possibility to define several scenarios (they might differ in a few parameters, but a future scenario might also require creating objects of new classes) and the possibility to maintain a set of algorithms (which is, again, a set of parameters but also the definition which classes are to be created). Parameters are passed to the classes in the constructor.
I am wondering which is the best way to manage all the scenario and algorithm configurations. It should be easily possible to have one developer work on one scenario with "his" algorithm and another developer working on another scenario with "his" different algorithm. Still, the parameter sets might be huge and should be "sharable" (if I defined a set of parameters for a certain algorithm in Scenario A, it should be possible to use the algorithm in Scenario B without copy&paste).
It seems like there are two main ways to accomplish my task:
Define a configuration file format that can handle my requirements. This format might be XML based or custom. As there is no C#-like reflection in C++, it seems like I have to update the config-file parser each time a new algorithm class is added to project (in order to convert a string like "MyClass" into a new instance of MyClass). I could create a name for every setup and pass this name as command line argument.
The pros are: no compilation required to change a parameter and re-run, I can easily store the whole config file with the simulation results
contra: seems like a lot of effort, especially hard because I am using a lot of template classes that have to be instantiated with given template arguments. No IDE support for writing the file (at least without creating a whole XSD which I would have to update everytime a parameter/class is added)
Wire everything up in C++ code. I am not completely sure how I would do this to separate all the different creation logic but still be able to reuse parameters across scenarios. I think I'd also try to give every setup a (string) name and use this name to select the setup via command line arg.
pro: type safety, IDE support, no parser needed
con: how can I easily store the setup with the results (maybe some serialization?)?, needs compilation after every parameter change
Now here are my questions:
- What is your opinion? Did I miss
important pros/cons?
- did I miss a third option?
- Is there a simple way to implement the config file approach that gives
me enough flexibility?
- How would you organize all the factory code in the seconde approach? Are there any good C++ examples for something like this out there?
Thanks a lot!
There is a way to do this without templates or reflection.
First, you make sure that all the classes you want to create from the configuration file have a common base class. Let's call this MyBaseClass and assume that MyClass1, MyClass2 and MyClass3 all inherit from it.
Second, you implement a factory function for each of MyClass1, MyClass2 and MyClass3. The signatures of all these factory functions must be identical. An example factory function is as follows.
MyBaseClass * create_MyClass1(Configuration & cfg)
{
// Retrieve config variables and pass as parameters
// to the constructor
int age = cfg->lookupInt("age");
std::string address = cfg->lookupString("address");
return new MyClass1(age, address);
}
Third, you register all the factory functions in a map.
typedef MyBaseClass* (*FactoryFunc)(Configuration *);
std::map<std::string, FactoryFunc> nameToFactoryFunc;
nameToFactoryFunc["MyClass1"] = &create_MyClass1;
nameToFactoryFunc["MyClass2"] = &create_MyClass2;
nameToFactoryFunc["MyClass3"] = &create_MyClass3;
Finally, you parse the configuration file and iterate over it to find all the entries that specify the name of a class. When you find such an entry, you look up its factory function in the nameToFactoryFunc table and invoke the function to create the corresponding object.
If you don't use XML, it's possible that boost::spirit could short-circuit at least some of the problems you are facing. Here's a simple example of how config data could be parsed directly into a class instance.
I found this website with a nice template supporting factory which I think will be used in my code.