Dynamic classes/objects ML.net's PredictionMoadel<TInput, TOutput> Train() - ml.net

I am using Microsoft's ML.net library.
I would like to train data based on a model who's contract is generated at run-time (meaning the fields are not known at compile-time). Can this be achieve using the current ML.net's Train() method siganture?
So far i am trying to call this Train method by passing in the instance of the TInput and TOutput objects (rather than the T class).

According to the docs, LearningPipeline only has one method, Train<TIn, TOut> for training, and it implies that TIn and TOut are actual classes: TIn an input to prediction, and TOut an output.
The underlying ML.NET code doesn't actually depend on knowing the schema ahead of time: the Train<TIn, TOut> method is a convenience method that we decided to expose to the users. The side effect of that decision was that we essentially prohibited use cases like yours.
Of course, you can still use Reflection to generate the class signatures at runtime, when you know the schema of the data, but it is an awkward workaround.
The new ML.NET API that we are working on (see issues in this project) would lift this requirement: you will be able to train on data which schema is not known at compile time.

Related

arcpy: get feature class as object

How do I create an object in python from a feature class in a geodatabase? I would think the following code would create a featureclass object?
featureclassobject = "C:/path/to/my/featureclass"
But this creates a string object, right? So I am not able to pass this object into an arcpy function later on.
You are correct that it creates a string object. However, whether it will work with a particular ArcPy function depends on the function -- in most cases, the tool simply needs to know the path to the function as a string (which the featureclassobject is).
The help pages are slightly unhelpful in this regard. Buffer, for example, says that input parameter in_features needs to be data type "Feature Layer" -- however, what it really expects is a string that describes where the feature layer can be found.
One significant exception to this is geometry objects:
In many geoprocessing workflows, you may need to run a specific operation using coordinate and geometry information but don't necessarily want to go through the process of creating a new (temporary) feature class, populating the feature class with cursors, using the feature class, then deleting the temporary feature class. Geometry objects can be used instead for both input and output to make geoprocessing easier.
But if you've already got a feature class (or shapefile) on disk, that's much simpler than creating an in-memory geometry object to work with.

Why should I use DECLARE_DYNAMIC instead of DECLARE_DYNCREATE?

DECLARE_DYNCREATE provides exactly the same feature of DECLARE_DYNAMIC along with its dynamic object creation ability. Then why should anyone use DECLARE_DYNAMIC instead of DECLARE_DYNCREATE?
The macros are documented to provide different functionality.
DECLARE_DYNAMIC:
Adds the ability to access run-time information about an object's class when deriving a class from CObject.
This provides the functionality for introspection, similar to RTTI (Run-Time Type Information) provided by C++. An application can query a CObject-derived class instance for its run-time type through the associated CRuntimeClass Structure. It is useful in situations where you need to check that an object is of a particular type, or has a specific base class type. The examples at CObject::IsKindOf should give you a good idea.
DECLARE_DYNCREATE:
Enables objects of CObject-derived classes to be created dynamically at run time.
This macro enables dynamic creation of class instances at run-time. The functionality is provided through the class factory method CRuntimeClass::CreateObject. It can be used when you need to create class instances at run-time based on the class type's string representation. An example would be a customizable GUI, that is built from an initialization file.
Both features are implemented through the same CRuntimeClass Structure, which may lead to the conclusion that they can be used interchangeably. In fact, code that uses an inappropriate macro will compile just fine, and expose the desired run-time behavior. The difference is purely semantic: The macros convey different intentions, and should be used according to the desired features, to communicate developer intent.
There's also a third related macro, DECLARE_SERIAL:
Generates the C++ header code necessary for a CObject-derived class that can be serialized.
It enables serialization of respective CObject-derived class instances, for example to a file, memory stream, or network socket. Since the deserialization process requires dynamic creation of objects from the serialized stream, it includes the functionality of DECLARE_DYNCREATE.
Put together, the following list should help you pick the right macro for your specific scenarios:
Use DECLARE_DYNAMIC if your code needs to retrieve an object's run-time type.
Use DECLARE_DYNCREATE if, in addition, you need to dynamically create class instances based on the type's string representation.
Use DECLARE_SERIAL if, in addition, you need to provide serialization support.
You're asking "why buy a Phillips screwdriver when I own a flathead?" The answer is that you should use the tool that suits your needs: if you need to drive only flathead screws, don't buy a Phillips driver. Otherwise, buy one.
If you need the features provided by DECLARE_DYNCREATE (e.g. because you're creating a view that's auto-created by the framework when a document is opened) then you should use DECLARE_DYNCREATE and if you don't and DECLARE_DYNAMIC works, you should use it.

Design Pattern for an EEPROM burner

I've built myself a basic EEPROM burner using a Teensy++ 2.0 for my PC bridge, and it's working great, but as I look to expand its compatibility, my code is getting rather hacky. I'm looking for some advice for a proper design for making this code expandable. I've taken a class in software design patterns, but it was awhile ago, and I'm currently drawing a blank. Basically, here's the use case:
I have several methods, such as ReadByte(), WriteByte(), ProgramByte() (for FlashROMs that require a multi-byte write sequence in order to program), EraseChip(), etc. so basically I have an EEPROM pure virtual base class that gets implemented by concrete classes for each chip type that I want to support. The tricky part is determining which chip type object to generate. I'm currently using a pseudo-terminal front-end on the Teensy++ serial input, a basic command-line type interface with parameters, to send options like the chip type to the Teensy++. The question is, is there a design pattern (in C/C++), something like a Factory Pattern, that would take a string input of the chip type (because that's what I'm getting from the user), and return an EEPROM object of the correct derived type, without having to manually create some big switch statement or something ugly like that where I'd have to add the new chip to a list any time I create a new chip derived class? So something like:
public const EEPROM & GetEEPROM(const std::string & id)
and if I pass it the string "am29f032b" it returns a reference to an AM29F032B object, or if I pass it the string "sst39sf040" it returns a reference to an SST39SF040 object, which I could then call the previously mentioned functions on, and it would work for the specified chip.
This code will be run on an AVR microcontroller, so I can't have anything with a huge OOP overhead, but the particular microcontroller I'm using does have a relatively large amount of program flash and work RAM, so it's not like I'm trying to operate in 2kb, but I do have to keep in mind limited resources.
What you're looking for is a Pluggable Factory. There's a good description here. It's attributed to John Vlissides (1 of the Gang of Four) and takes the Abstract factory pattern a few steps further. This pattern happens to also be the architectural foundation of COM.
The usual way of implementing one in C++ is to maintain a static registry of abstract factories. With judicious use of a few templates and static initialisers, you can wrap the whole lot up few lines of boiler-plate which you include in each concrete product (e.g. chip type).
The use of static initialisers allows a complete decoupling of concrete products from both the registry and the code wanting to create products, and has the possibility of implementing each as a plug-in.
You could have a singleton factory manager that keeps a map of string->factory object. Then each factory class would have a global instance that registers itself with the manager on startup.
I try to avoid globals in general, and singletons in particular, but any other approach would require some form of explicit list which you're trying to avoid. You will have to be careful with timing issues (you can't really assume anything about the order in which the various factories will be created).

Game entity type allocation

I am migrating a tile based 2D game to C++, because I am really not a fan of Java (some features are nice, but I just can't get used to it). I am using TMX tiled maps. This question is concerning how to translate the object definitions into actual game entities. In Java, I used reflection to allocate an object of the specified type (given that it derived from the basic game entity).
This worked fine, but this feature is not available in C++ (I understand why, and I'm not complaining. I find reflection messy, and I was hesitant to use it in Java, haha). I was just wondering what the best way was to translate this data. My idea was a base class from which all entities could derive from (this seems pretty standard), then have the loader allocate the derived types based on the 'type' value from the TMX map. I have thought of two ways to do this.
A giant switch-case block. Lengthy and disgusting. I'm doubtful that anyone would suggest this (but it is the obvious).
Use a std::map, which would map arbitrary type names to a function to allocate said classes corresponding to said type names.
Lastly, I had thought of making entities of one base class, and using scripting for different entity types. The scripts themselves would register their entity type with the system, although the game would need to load said entity type scripts upon loading (this could be done via one main entity type declaration script, which would bring the number of edits per entity down to 2: entity creation, and entity registration).
while option two looks pretty good, I don't like having to change 3 pieces of code for each type (defining the entity class, defining an allocate function, and adding the function to the std::map). Option 3 sounds great except for two things in my mind: I'm afraid of the speed of purely script driven entities. Also, I know that adding scripting to my engine is going to be a big project in itself (adding all the helper functions for interfacing with the library will be interesting).
Does anyone know of a better solution? Maybe not better, but just cleaner. With less code edits per entity type.
You can reduce the number of code changes in solution 2, if you use a self registration to a factory. The drawback is that the entities know this factory (self registration) and this factory has to be a global (e.g. a singleton) instance. If tis is no problem to you, this pattern can be very nice. Each new type requires only compilation an linking of one new file.
You can implement self registration like this:
// foo.cpp
namespace
{
bool dummy = FactoryInstance().Register("FooKey", FooCreator);
}
Abstract Factory, Template Style, by Jim Hyslop and Herb Sutter

Flexible application configuration in C++

I am developing a C++ application used to simulate a real world scenario. Based on this simulation our team is going to develop, test and evaluate different algorithms working within such a real world scenrio.
We need the possibility to define several scenarios (they might differ in a few parameters, but a future scenario might also require creating objects of new classes) and the possibility to maintain a set of algorithms (which is, again, a set of parameters but also the definition which classes are to be created). Parameters are passed to the classes in the constructor.
I am wondering which is the best way to manage all the scenario and algorithm configurations. It should be easily possible to have one developer work on one scenario with "his" algorithm and another developer working on another scenario with "his" different algorithm. Still, the parameter sets might be huge and should be "sharable" (if I defined a set of parameters for a certain algorithm in Scenario A, it should be possible to use the algorithm in Scenario B without copy&paste).
It seems like there are two main ways to accomplish my task:
Define a configuration file format that can handle my requirements. This format might be XML based or custom. As there is no C#-like reflection in C++, it seems like I have to update the config-file parser each time a new algorithm class is added to project (in order to convert a string like "MyClass" into a new instance of MyClass). I could create a name for every setup and pass this name as command line argument.
The pros are: no compilation required to change a parameter and re-run, I can easily store the whole config file with the simulation results
contra: seems like a lot of effort, especially hard because I am using a lot of template classes that have to be instantiated with given template arguments. No IDE support for writing the file (at least without creating a whole XSD which I would have to update everytime a parameter/class is added)
Wire everything up in C++ code. I am not completely sure how I would do this to separate all the different creation logic but still be able to reuse parameters across scenarios. I think I'd also try to give every setup a (string) name and use this name to select the setup via command line arg.
pro: type safety, IDE support, no parser needed
con: how can I easily store the setup with the results (maybe some serialization?)?, needs compilation after every parameter change
Now here are my questions:
- What is your opinion? Did I miss
important pros/cons?
- did I miss a third option?
- Is there a simple way to implement the config file approach that gives
me enough flexibility?
- How would you organize all the factory code in the seconde approach? Are there any good C++ examples for something like this out there?
Thanks a lot!
There is a way to do this without templates or reflection.
First, you make sure that all the classes you want to create from the configuration file have a common base class. Let's call this MyBaseClass and assume that MyClass1, MyClass2 and MyClass3 all inherit from it.
Second, you implement a factory function for each of MyClass1, MyClass2 and MyClass3. The signatures of all these factory functions must be identical. An example factory function is as follows.
MyBaseClass * create_MyClass1(Configuration & cfg)
{
// Retrieve config variables and pass as parameters
// to the constructor
int age = cfg->lookupInt("age");
std::string address = cfg->lookupString("address");
return new MyClass1(age, address);
}
Third, you register all the factory functions in a map.
typedef MyBaseClass* (*FactoryFunc)(Configuration *);
std::map<std::string, FactoryFunc> nameToFactoryFunc;
nameToFactoryFunc["MyClass1"] = &create_MyClass1;
nameToFactoryFunc["MyClass2"] = &create_MyClass2;
nameToFactoryFunc["MyClass3"] = &create_MyClass3;
Finally, you parse the configuration file and iterate over it to find all the entries that specify the name of a class. When you find such an entry, you look up its factory function in the nameToFactoryFunc table and invoke the function to create the corresponding object.
If you don't use XML, it's possible that boost::spirit could short-circuit at least some of the problems you are facing. Here's a simple example of how config data could be parsed directly into a class instance.
I found this website with a nice template supporting factory which I think will be used in my code.