Generating data structures by parsing plain text files - c++

I wrote a file parser for a game I'm writing to make it easy for myself to change various aspects of the game (things like the character/stage/collision data). For example, I might have a character class like this:
class Character
{
public:
int x, y; // Character's location
Character* teammate;
}
I set up my parser to read in from a file the data structure with syntax similar to C++
Character Sidekick
{
X = 12
Y = 0
}
Character AwesomeDude
{
X = 10
Y = 50
Teammate = Sidekick
}
This will create two data structures and put them in a map<std::string, Character*>, where the key string is whatever name I gave it (in this case Sidekick and AwesomeDude). When my parser sees a pointer to a class, like the teammate pointer, it's smart enough to look up in the map to fetch the pointer to that data structure. The problem is that I can't declare Sidekick's teammate to be AwesomeDude because it hasn't been placed into the Character map yet.
I'm trying to find the best way to solve this so that I can have my data structures reference objects that haven't yet been added to the map. The two easiest solutions that I can think of are (a) add the ability to forward declare data structures or (b) have the parser read through the file twice, once to populate the map with pointers to empty data structures and a second time to go through and fill them in.
The problem with (a) is that I also can decide which constructor to call on a class, and if I forward declare something I'd have to have the constructor be apart from the rest of the data, which could be confusing. The problem with (b) is that I might want to declare Sidekick and AwesomeDude in their own files. I'd have to make my parser be able to take a list of files to read rather than just one at a time (this isn't so bad I guess, although sometimes I might want to get a list of files to read from a file). (b) also has the drawback of not being able to use data structures declared later in the constructor itself, but I don't think that's a huge deal.
Which way sounds like a better approach? Is there a third option I haven't thought of? It seems like there ought to be some clever solution to this with pointer references or binding or something... :-/ I suppose this is somewhat subjective based on what features I want to give myself, but any input is welcome.

When you encounter the reference the first time, simply store it as a reference. Then, you can put the character, or the reference, or whatever on a list of "references that need to be resolved later".
When the file is done, run through those that have references and resolve them.

Well, you asked for a third option. You don't have to use XML, but if you follow the following structure, it would be very simple to use a SAX parser to build your data structure.
At any rate, instead of referencing a teammate, each character references a team (Blue team in this case). This will decouple the circular reference issue. Just make sure you list the teams before the characters.
<team>Blue</team>
<character>
<name>Sidekick</name>
<X>12</X>
<Y>0</Y>
<teamref>Blue</teamref>
</character>
<character>
<name>Sidekick</name>
<X>10</X>
<Y>50</Y>
<teamref>Blue</teamref>
</character>

Personally, I'd go with b). Splitting your code into Parser and Validator classes, both operating on the same data structure. The Parser will read and parse a file, filling the data structure and storing any object references as their textual names, leaving the real pointer null in your structure for now.
When you are finished loading the files, use the Validator class to validate and resolve any references, filling in the "real" pointers. You will want to consider how to structure your data to make these lookups nice and fast.

Will said exactly what I was about to write. Just keep a list or something with the unsolved references.
And don't forget to throw an error if there are unsolved references once you finish reading the file =P

Instead of storing Character object in your map, store a proxy for Character. The proxy will than contain a pointer to the actual Character object when the object is loaded. The type of Character::teammate will be changed to this proxy type. When you read in a reference that is not already in your map, you create a proxy and use the proxy. When you load an character which you already have an empty proxy in the map, populate it with your newly loaded character. You may also want to add a counter to keep track of how many empty proxy you have in the map so you know when all referenced characters have been loaded.
Another layer of indirection....it always make programming easier and slower.

One option would be to reverse the obligation. The Map is responsible for filling in the reference
template<T> class SymbolMap // I never could rememeber C++ template syntax
{
...
/// fill in target with thing name
/// if no name yet, add it to the list of thing that will be name
void Set(T& target, std::string name);
/// define name as target
/// go back and fill in anything that needs to be name
void Define(T target, std::string name);
/// make sure everything is resolved
~SymbolMap()
}
that won't interact well with value/moving semantics but I suspect that not much will.

Related

How to automatically initialize component parameters?

While doing a game engine that uses .lua files in order to read parameter values, I got stuck when I had to read these values and assign them to the parameters of each component in C++. I tried to investigate the way Unity does it, but I didn't find it (and I'm starting to doubt that Unity has to do it at all).
I want the parameters to be initialized automatically, without the user having to do the process of
myComponentParameter = readFromLuaFile("myParameterName")
for each one of the parameters.
My initial idea is to use the std::variant type, and storing an array of variants in order to read them automatically. My problems with this are:
First of all, I don't know how to know the type that std::variant is storing at the moment (tried with std::variant::type, but it didn't work for the template), in order to cast from the untyped .lua value to the C++ value. For reference, my component initialization looks like this:
bool init(luabridge::LuaRef parameterTable)
{
myIntParameter = readVariable<int>(parameterTable, "myIntParameter");
myStringParameter = readVariable<std::string>(parameterTable, "myStringParameter");
return true;
}
(readVariable function is already written in this question, in case you're curious)
The second problem is that the user would have to write std::get(myIntParameter); whenever they want to access to the value stored by the variant, and that sounds like something worse than making the user read the parameter value.
The third problem is that I can't create an array of std::variant<any type>, which is what I would like to do in order to automatically initialize the parameters.
Is there any good solution for this kind of situation where I want the init function to not be necessary, and the user doesn't need to manually set up the parameter values?
Thanks in advance.
Let's expand my comment. In a nutshell, you need to get from
"I have some things entered by the user in some file"
to:
"the client code can read the value without std::get"
…which roughly translates to:
"input validation was done, and values are ready for direct use."
…which implies you do not store your variables in variants.
In the end it is a design question. One module somewhere must have the knowledge of which variable names exist, and the type of each, and the valid values.
The input of that module will be unverified values.
The output of the module will probably be some regular c++ struct.
And the body of that module will likely have a bunch of those:
config.foo = readVariable<int>("foo");
config.bar = readVariable<std::string>("bar");
// you also want to validate values there - all ints may not be valid values for foo,
// maybe bar must follow some specific rules, etc
assuming somewhere else it was defined as:
struct Configuration {
int fooVariable;
std::string bar;
};
Where that module lives depends on your application. If all expected types are known, there is no reason to ever use a variant, just parse right away.
You would read to variants if some things do not make sense until later. For instance if you want to read configuration values that will be used by plugins, so you cannot make sense of them yet.
(actually even then simply re-parsing the file later, or just saving values as text for later parsing would work)

Array within array when loading from a file

My problem is quite peculiar in that I have a solution, but it is in my opinion extremely poor and I am looking for a better one. Here's the problem:
There's a file that needs to be loaded and have characteristics to create a bunch of objects. The object will have 10-12 properties and there will be a few types of objects each one with different characteristics. To make it even more fun, there's one type of object that is actually made up of two separate other objects, but that can be ignored for the purpose of this question. The file will store the characteristics for a few objects as well as the base class. For example the base class will contain the "car" manufacturer, factory made and maybe the engine used, but each of the objects will contain other data unique to each model; model name, tires, or whatever you can think of.
At the moment I have the cars loaded into a map, from the file:
std::map<std::string,std::string> loadCars(std::string filePath);
This way I can just pass this map to each of the objects constructors and it'll look for the key it needs (For example one model might need to check for the types of sunroof used whilst another doesn't have a sunroof, so can ignore it) and initialize the data. It'll load the car model name into the object name etc, like this:
Car ford = new Car(std::map loadCars(cars.txt);
Now here's the problems, from minor to major:
1) If three car models are loaded in and none of them have sunroofs, the sunroof field in the map will be unitiliazed when its passed to the constructors, I dont know if this will cause errors, so I will require a function that pre-initializes the map with empty values, which I don't think is too bad.
2) And here's the major problem and why this solution does not work:
Some of the values in the car file are arrays. Meaning that an std::string in the map will not store them. Now the solution I have at the moment to me appears very ugly:
std::map<std::string, std::vector<std::string>> loadCars(std::string);
So I pass this map to the constructor and for each of the vectors it has to check if it has more than one value. For example model name would not, so it would be hardcoded to put that one value in a string. Or if this particular car only has one type of tire but normally they have multiple types, it'll just put in the vector with one element.
The problem is that when doing the actual file loading absolutely everything will have to be converted from simple strings like the string for the model name into a vector. This seems like a waste and very shoddy code.
Are there any better ways of doing this? Ideally you'd have a map-like container where the second value can either be a vector or a string but I dont think those exist.

Boost PTree used just for reading file or for storing the values too?

I have a configuration file, that is a json. I have created a class (ConfigFile) that reads that file and store the values (using boost parser and ptree). I am wandering is it a good practice to use the ptree as a member of the ConfigFile class, or I shall use it just for reading the json and store the values in a map member?
I'd say what matters is the ConfigFile's interface. If you can keep it consistent with either version, it shouldn't be a problem to just choose one and switch to the other if you feel the need without breaking anything.
Keep property tree out of the header. The latter can also be fixed with the pimpl idiom.
#sehe's comment makes a lot of sense here as well and is something to remember.

Serialization with a custom pattern and random access with Boost

I'm asking here because i have already tried to search but i have no idea if this things even exist and what their names are.
I start explaining that with custom pattern i mean this: suppose that i need to serialize objects or data of type foo, bar and boo, usually the library handle this for the user in a very simple way, what comes first goes in first in the serialization process, so if i serialize all the foo first they are written "at the top" of the file and all the bar and boo are after the foo.
Now I would like to keep order in my file and organize things based on a custom pattern, it's this possible with Boost ? What section provides this feature ?
Second thing, that is strictly related to the first one, I also would like to access my serialized binary files in a way that I'm not forced to parse and read all the previous values to extract only the one that I'm interested in, kinda like the RAM that works based on memory address and offers a random access without forcing you to parse all the others addresses.
Thanks.
On the first issue: the Boost serialization library is agnostic as to what happens after it turns an object into its serialized form. It does this by using input and output streams. Files are just that - fostream/fistream. For other types of streams however, the order/pattern that you speak of doesn't make sense. Imagine you're sending serialized objects over the network - the library can't know that it'll have to rearrange the order of objects and, in fact, it can't do that once they've been sent. For this reason, it does not support what you're looking for.
What you can do is create a wrapper that either just caches serialized versions of the objects and arranges them in memory before you tell it to write them out to a file, or that knows that since you're working with files, it can later tellg to the appropriate place in the file and append (this approach would require you to store the locations of the objects you wrote to the file).
As for the second thing - random access file reading. You will have to know exactly where the object is in memory. If you know that the structure of your file won't change, you can seekg on the file stream before handing it to boost for deserialization. If the file structure will change however, you still need to know the location of objects in the file. If you don't want to parse the file to find it, you'll have to store it somewhere during serialization. For example - you can maintain a sort of registry of objects at the top of the file. You will still have to parse it, but it should be just a simple [Object identifier]-[location in file] sort of thing.

C++, creating classes in runtime

I have a query, I have set of flat files ( say file1, file2 etc) containing column names and native data types. ( how values are stored and can be read in c++ is elementary)
eg. flat file file1 may have data like
col1_name=id, col1_type=integer, col2_name=Name, col2_type=string and so on.
So for each flat file I need to create C++ data structure ( i.e 1 flat file = 1 data structure) where the member variable name is same name as column name and its data type will be of C++ native data type like int, float, string etc. according to column type in flat file.
from above eg: my flat file 1 should give me below declaration
class file1{
int id;
string Name;
};
Is there a way I can write code in C++, where binary once created will read the flat file and create data structure based on the file ( class name will be same as flat file name). All the classes created using these flat files will have common functionality of getter and setter member functions.
Do let me know if you have done something similar earlier or have any idea for this.
No, not easily (see the other answers for reasons why not).
I would suggest having a look at Python instead for this kind of problem. Python's type system combined with its ethos of using try/except lends itself more easily to the challenge of parsing data.
If you really must use C++, then you might find a solution using the dynamic properties feature of Qt's QObject class, combined with the QVariant class. Although this would do what you want, I would add a warning that this is getting kind of heavy-weight and may over-complicate your task.
No, not directly. C++ is a compiled language. The code for every class is created by the compiler.
You would need a two-step process. First, write a program that reads those files and translates them into a .cpp file. Second, pass those .cpp files to a compiler.
C++ classes are pure compile-time concepts and have no meaning at runtime, so they cannot be created. However, you could just go with
std::vector<std::string> fields;
and parse as necessary in your accessor functions.
No, but from what I can tell, you have to be able to store the names of multiple columns. What you can do is have a member variable map or unordered_map which you can index with a string - the name of the column - and get some data (like a column object or something) back. That way you can do
obj.Columns["Name"]
I'm not sure there's a design pattern to this, but if your list of possible type names is finite, and known at compile time, can't you declare all those classes in your program before running, and then just instantiate them based on the data in the files?
What you actually want is a field whose exact nature varies at runtime.
There are several approaches, including Boost.Any, but because of the static nature of C++ type system only 2 are really recommended, and both require to have beforehand an idea of all the possible data types that may be required.
The first approach is typical:
Object base type
Int, String, Date whatever derived types
and the use of polymorphism.
The second requires a bit of Boost magic: boost::variant<int, std::string, date>.
Once you have the "variant" part covered, you need to implement visitation to distinguish between the different possible types. Typical visitors for the traditional object-oriented approach or simply boost::static_visitor<> and boost::apply_visitor combinations for the boost approach.
It's fairly straightforward.