C++, creating classes in runtime - c++

I have a query, I have set of flat files ( say file1, file2 etc) containing column names and native data types. ( how values are stored and can be read in c++ is elementary)
eg. flat file file1 may have data like
col1_name=id, col1_type=integer, col2_name=Name, col2_type=string and so on.
So for each flat file I need to create C++ data structure ( i.e 1 flat file = 1 data structure) where the member variable name is same name as column name and its data type will be of C++ native data type like int, float, string etc. according to column type in flat file.
from above eg: my flat file 1 should give me below declaration
class file1{
int id;
string Name;
};
Is there a way I can write code in C++, where binary once created will read the flat file and create data structure based on the file ( class name will be same as flat file name). All the classes created using these flat files will have common functionality of getter and setter member functions.
Do let me know if you have done something similar earlier or have any idea for this.

No, not easily (see the other answers for reasons why not).
I would suggest having a look at Python instead for this kind of problem. Python's type system combined with its ethos of using try/except lends itself more easily to the challenge of parsing data.
If you really must use C++, then you might find a solution using the dynamic properties feature of Qt's QObject class, combined with the QVariant class. Although this would do what you want, I would add a warning that this is getting kind of heavy-weight and may over-complicate your task.

No, not directly. C++ is a compiled language. The code for every class is created by the compiler.
You would need a two-step process. First, write a program that reads those files and translates them into a .cpp file. Second, pass those .cpp files to a compiler.

C++ classes are pure compile-time concepts and have no meaning at runtime, so they cannot be created. However, you could just go with
std::vector<std::string> fields;
and parse as necessary in your accessor functions.

No, but from what I can tell, you have to be able to store the names of multiple columns. What you can do is have a member variable map or unordered_map which you can index with a string - the name of the column - and get some data (like a column object or something) back. That way you can do
obj.Columns["Name"]

I'm not sure there's a design pattern to this, but if your list of possible type names is finite, and known at compile time, can't you declare all those classes in your program before running, and then just instantiate them based on the data in the files?

What you actually want is a field whose exact nature varies at runtime.
There are several approaches, including Boost.Any, but because of the static nature of C++ type system only 2 are really recommended, and both require to have beforehand an idea of all the possible data types that may be required.
The first approach is typical:
Object base type
Int, String, Date whatever derived types
and the use of polymorphism.
The second requires a bit of Boost magic: boost::variant<int, std::string, date>.
Once you have the "variant" part covered, you need to implement visitation to distinguish between the different possible types. Typical visitors for the traditional object-oriented approach or simply boost::static_visitor<> and boost::apply_visitor combinations for the boost approach.
It's fairly straightforward.

Related

How could you recreate UE4 serialization of user-defined types?

The problem
The Unreal Engine 4 Editor allows you to add objects of your own types to the scene.
Doing so requires minimal work from the user - to make a class visible in the editor you only need to add some macros, like UCLASS()
UCLASS()
class MyInputComponent: public UInputComponent //you can instantiate it in the editor!
{
UPROPERTY(EditAnywhere)
bool IsSomethingEnabled;
};
This is enough to allow the editor to serialize the created-in-editor object's data (remember: the class is user-defined but the user doesn't have to hardcode loading specific fields. Also note that the UPROPERTY variable can be of user-defined type as well). It is then deserialized while loading the actual game. So how is it handled so painlessly?
My attempt - hardcoded loading for every new class
class Component //abstract class
{
public:
virtual void LoadFromStream(std::stringstream& str) = 0;
//virtual void SaveIntoStream(std::stringstream& str) = 0;
};
class UserCreatedComponent: public Component
{
std::string Name;
int SomeInteger;
vec3 SomeVector; //example of user-defined type
public:
virtual void LoadFromStream(std::stringstream& str) override //you have to write a function like this every time you create a new class
{
str >> Name >> SomeInteger >> SomeVector.x >> SomeVector.y >> SomeVector.z;
}
};
std::vector<Component*> ComponentsFromStream(std::stringstream& str)
{
std::vector<Component*> components;
std::string type;
while (str >> type)
{
if (type == "UserCreatedComponent") //do this for every user-defined type...
components.push_back(new UserCreatedComponent);
else
continue;
components.back()->LoadFromStream(str);
}
return components;
}
Example of an UserCreatedComponent object stream representation:
UserCreatedComponent MyComponent 5 0.707 0.707 0.707
The engine user has to do these things every time he creates a new class:
1. Modify ComponentsFromStream by adding another if
2. Add two methods, one which loads from stream and another which saves to stream.
We want to simplify it so the user only has to use a macro like UPROPERTY.
Our goal is to free the user from all this work and create a more extensible solution, like UE4's (described above).
Attempt at simplifying 1: Using type-int mapping
This section is based on the following: https://stackoverflow.com/a/17409442/12703830
The idea is that for every new class we map an integer, so when we create an object we can just pass the integer given in the stream to the factory.
Example of an UserCreatedComponent object stream representation:
1 MyComponent 5 0.707 0.707 0.707
This solves the problem of working out the type of created object but also seems to create two new problems:
How should we map classes to integers? What would happen if we include two libraries containing classes that map themselves to the same number?
What will initializing e.g. components that need vectors for construction look like? We don't always use strings and ints for object construction (and streams give us pretty much only that).
So how is it handled so painlessly?
C++ language does not provide features which would allow to implement such simple de/serialization of class instances as it works in the Unreal Engine. There are various ways how to workaround the language limitations, the Unreal uses a code generator.
The general idea is following:
When you start project compilation, a code generator is executed.
The code generator parses your header files and searches for macros which has special meaning, like UCLASS, USTRUCT, UENUM, UPROPERTY, etc.
Based on collected data, it generates not only code for de/serialization, but also for other purposes, like reflection (ability to iterate certain members), information about inheritance, etc.
After that, your code is finally compiled along with the generated code.
Note: this is also why you have to include "MyClass.generated.h" in all header files which declare UCLASS, USTRUCT and similar.
In other words, someone must write the de/serialization code in some form. The Unreal solution is that the author of such code is an application.
If you want to implement such system yourself, be aware that it's lots of work. I'm no expert in this field, so I'll just provide general information:
The primary idea of code-generators is to automatize repetitive work, nothing more - in other words, there's no other special magic. That means that "how objects are de/serialized" (how they're transformed from memory to file) and "how the code which de/serializes is created" (whether it's written by a person or generated by an application) are two separate topics.
First, it should be established how objects are de/serialized. For example, std::stringstream can be used, or objects can be de/serialized from/to generally known formats like XML, json, bson, yaml, etc., or a custom solution can be defined.
Establish what's the source of data for generated de/serialization code. In case of Unreal Engine, it's user code itself. But it's not the only way - for example Protobuffers use a simple language which is used only to define data structure and the generator creates code which you can include and use.
If the source of data should be C++ code itself, do not write you own C++ parser! (The only exceptions to this rule are: educational purpose or if you want to spend rest of your life with working on the parser.) Luckily, there are projects which you can use - for example there's clang AST.
How should we map classes to integers? What would happen if we include two libraries containing classes that map themselves to the same number?
There's one fundamental problem with mapping classes to integers: it's not possible to uniquely map every possible class name to an integer.
Proof: create classes named Foo_[integer] and map it to the [integer], i.e. Foo_0 -> 0, Foo_1 -> 1, Foo_2 -> 2, etc. After you use biggest integer value, how do you map Bar_0?
You can start assigning the numbers sequentially as they're added to a project, but as you correctly pin-pointed, what if you include new library? You could start counting from some big number, like 1.000.000, but how do you determine what should be first number for each library? It doesn't have a clear solution.
Some of solutions to this problem are:
Define clear subset of classes which can be de/serialized and assign sequential integers to these classes. The subset can be, for example, "only classes in my project, no library support".
Identify classes with two integers - one for class, one for library. This means you have to have some central register which assigns library integers uniquely (e.g. in order they're registered).
Use string which uniquely identifies the class including library name. This is what Unreal uses.
Generate a hash from class and library name. There's risk of hash collision - the better hash you use, the lower risk there is. For example git (the version control application) uses SHA-1 (which is considered unsafe today) to identify it's objects (files, directories, commits) and the program is used worldwide without bigger issues.
Generate UUID, a 128-bit random number (with special rules). There's also risk of collision, but it's generally considered highly improbable. Used by Java and Unity the game engine.
What would happen if we include two libraries containing classes that map themselves to the same number?
That's called a collision. How it's handled depends on design of de/serialization code, there are mainly two approaches to this problem:
Detect that. For example if your class identifier contains library identifier, don't allow loading/registering library with ID which is already identified. In case of ID which doesn't include library ID (e.g. hash/UUID variant), don't allow registering such classes. Throw an exception or exit the application.
Assume there's no collision. If actual collision happens, it's so-called UB, an undefined behaviour. The application will probably crash or act weirdly. It might corrupt stored data.
What will initializing e.g. components that need vectors for construction look like? We don't always use strings and ints for object construction (and streams give us pretty much only that).
This depends on what it's required from de/serializing code.
The simplest solution is actually to use string of values separated by space.
For example, let's define following structure:
struct Person
{
std::string Name;
float Age;
};
A vector of Person instances could look like: 3 Adam 22.2 Bob 34.5 Cecil 19.0 (i.e. first serialize number of items (vector size), then individual items).
However, what if you add, remove or rename a member? The serialized data would become unreadable. If you want more robust solution, it might be better to use more structured data, for example YAML:
persons:
- name: Adam
age: 22.2
- name: Bob
age: 34.5
- name: Cecil
age: 19.0
Final notes
The problem of de/serializing objects (in C++) is actually big, various systems uses various solutions. That's why this answer is so generic and it doesn't provide exact code - there's not single silver bullet. Every solution has it's advantages and disadvantages. Even detailed description of just Unreal Engine's serialization system would become a book.
So this answer assumes that reader is able to search for various mentioned topic, like yaml file format, Protobuffers, UUID, etc.
Every mentioned solution to a sub-problem has lots of it's own problems which weren't explored. For example de/serialization of string with spaces or new lines from/to simple string stream. If it's needed to solve such problems, it's recommended to first search for more specialized questions or write one if there's nothing to be found.
Also, C++ is constantly evolving. For example, better support for reflection is added, which might, one day, provide enough features to implement high-quality de/serializer. However, if it should be done in compile-time, it would heavily depend on templates which slow down compilation process significantly and decrease code readibility. That's why code generators might be still considered a better choice.

Difference between RWDBTBuffer<T>, RWDBVector<T> and RWDBDecimalVector

I'm writing a python script to generate C++ classes used for database access and they use RogueWave types for data transfer. I have a few template classes I'm looking at to outline how the generated classes should look. When implementing a method for transferring several tuples in one operation, columns are wrapped in RWDBTbuffer, RWDBVector and RWDBDecimalVector.
My problem is, I can't see a direct correlation between the data type that is being wrapped (int, long, RWDateTime, RWDecimalPortable) and the container it is being placed in. It seems to me that I can just put everything in a RWDBTBuffer. What is the advantage of using RWDBDecimalVector over RWDBTBuffer for numeric types, and should RWDBVector ever be used?
in terms of data that they both store there isn't any different.
the main different is that you can shift RWDBVector into RWDBReader and then you can read the data into it.

How to design access to files that can contain different types?

We have an internally used type of file here that is basically a table where columns can have different types. There is also a header with IDs identifying which column is which type. Users typically do not generate table layouts all the time, there is rather a limited set of table layouts (say 10 or so, but more to come in the future).
My question is: What is the best approach to model this file in C++?
I can think of the following possibilities:
Create a templated file class where the template parameter is a struct containing the types of the columns.
In a different, static template class, put the header IDs as a static member and provide a function for endianness-safe reading (via explicit template specialisation).
Disadvantage: Need to create struct and static partner class for each file. Will fail at link time if static template class has no specialisation for this type. Is perhaps a misuse of templates.
Create an abstract data class, derive explicit overrides for each type that can be in a column and dynamically cast the pointers-to-base-class that I get from the file back to the right type (which I can find out via the header).
Disadvantage: Dynamic cast on every read.
Create a templated file class as above. Require from the template that it has methods which read and write a file header and provide endianness-safe reading.
Disadvantage: Need to create class with these methods for each file type instead of using plain structs as in 1. Require explicit template specialisations for built-in types.
Any other suggestions?
Since files are read-up at run-time, compile time solutions aren't real "solution".
This is a typical case for a "hierarchy of polymorphic classes" sharing a common root, destined to a collection of unique_ptr or shared_ptr to dynamic_cast from.
basically, you guess the type from the data format, and push_back a x_ptr<BaseClass>(new ActualClass).
BaseClass can hold a way to allow you to recognize which ActualClass you're dealing with, but that's noting more what dynamic_cast does. Just let BaseClass empty, with just a virtual destructor.
On the file, every record must start with a sort of "type_id" the actually tells you what is the actual type the following data represent.
Options:
Use self-descriptive data types ("type-size-value" triples), and perform cast per type.
Use XML structures (variation of the above, basically)
Use relative database (SQLite embedded in your code, for example, or anything else)

What is the best design pattern to register data "chunks"?

I have a library which can save/load on disk "chunks" which are POD structs with constant size and unique static CHUNK_ID field. So load looks somethink like this.
void Load(int docId, char* ptr, int type, size_t& size)...
If you want to add new chunk you just add struct with new CHUNK_ID and use Save Load functions to it.
What I want is to force all "chunks" to have functions like PrintHumanReadable, CompareThisTypeOfChunk etc(Ideally program should not compile without such functions). Also I want to mark/register/enumerate all chunk-structs.
I have a few ideas but all of them have problems.
Create base class with pure virtual functions PrintHumanReadable, CompareThisTypeOfChunk.
Problem:breaks pod type and requires library rewriting.
Implement factory which creates chunk struct from CHUNK_ID. Problem: compiles when I add new chunk without required functions.
Could you recomend elegant design solution for my problem?
Implement a simple code generator. You can use something like Mako or Cheetah (both Python libraries). Make a text file containing all the class names, then have the generator build the factory method and a series of methods which aren't really used but which refer to the desired methods in all the classes. This will also make it straightforward to enumerate the classes (again, using generated code).
The proper design pattern for this is called "use Boost.Serialization". It's really the best tool for writing objects to a format and then reading them back later. It can write in text, binary, and even XML formats (and others if you write a proper stream for them). It's can be non-intrusive, so you don't need to modify the objects to serialize them. And so forth.
Once you're using the proper tool for this job, you can then use whatever class hierarchy or other method you like to ensure that the proper functions for an object exist.
If you can't/won't use Boost.Serialization, then you're pretty much stuck with a runtime solution. And since the solution is runtime rather than compile time, there's no way to ensure at compile time that any particular chunk ID has the requisite functions.

Parsing huge data with c++

In my job, i need to parse different kind of data files from different data sources.Sometimes i parse them by writing directly c++ code (with the help of qt and boost:D), sometimes manually with a helper program.
I must note that data types are so different from each other it is so hard to create common a interface for all of them. But i want to do this job in a more generic way.I am planning to write a library to convert them and it should be easy to add new parser utility in future.I am also planning to use other helper programs inside my program, not manually.
My question is what kind of an architecture or pattern do you suggest, Basic condition is library must be extendable via new classes or dll's and also configurable.
By the way data can be in text, ascii or something like CSV(comma seperated values) and most of them are specific for a certain data.
Not to blow my own trumpet, but my small Open Source utility CSVfix has an extensible architecture based on deriving new C++ classes with a very simple interface. I did consider using a plugin-architecture with DLLs but it seemed like overkill for such a simple utility . If interested, you can get the binaries & sources here.
I'd suggest a 3-part model, where the common data-format is a String which should be able to contain every value:
Reader: In this layer the values are read from the source (ie. CSV-file) using some sort of file-format-descriptor. The values are then stored in some sort of intermediate data structure.
Connector/Converter: This layer is responsible for mapping the reader-data to the writer-fields.
Writer: This layer is responsible for writing a specific data structure to the target (ie. another file-format or a database).
This way you can write different Readers for different input files.
I think the hardest part would be creating the definition of the intermediate storage format/structure so that it is future-proof and flexible.
One method I used for defining data structure in my datafile read/write classes is to use std::map<std::string, std::vector<std::string>, string_compare> where the key is the variable name and the vector of strings is the data. While this is expensive in memory, it does not lock me down to only numeric data. And, this method allows for different lengths of data within the same file.
I had the base class implement this generic storage, while the derived classes implemented the reader/writer capability. I then used a factory to get to the desired handler, using another class that determined the file format.