How detect API breaks in C++ - c++

How can I detect incompatible API changes in C++? (not ABI but API changes)
Where compatible changes are things that can't break compilation of code using the API like:
parameter(s) added to method with default argument
methods added to a class
members added to a class
classes added
order of members or methods changed
comments/documentation changes
And incompatible changes are things that potentially break compilation of code using the API like:
removed arguments, (public/protected) methods, members, classes
type changes of arguments or members
name changes of public/protected members or methods
classes moved from one header to another

OP is right to think that C++ parsing is probably necessary. Likely deep reasoning, too.
I think the way to pose the question is,
for a particular set of uses of an API in an existing application, does changing the API change or break of the application?
If you don't limit yourself to a specific set of uses, almost any change to an API will change its semantics. Otherwise, why would you make them (modulo refactoring?). And if you use the full set of API features in the application, then its semantics must change somehow too.
With a specific set of uses, one can arguably determine which properties of the API might affect the specific uses, and determine if in fact they do. Ultimately you have to parse the original code accurately to determine the specific set of uses and the context in which they are used. You also have to determine the semantic properties on which the existing application depends, including the properties provided by the legacy API. Finally, you need to determine the properties defined by the new API, and verify still support the needs of the application.
In general, you need a theorem prover over the program properties to check this. And, while theorem proving technology has advanced significantly over the last 50 years, AFAIK said technology isn't strong enough to take generally arbitrary program properties and prove them, let alone overcome the problem of reasoning about arbitrarily complex programs.
Consider:
// my application
int x=0;
int y=foo(x); // API ensures that fail...
if (y>3) then fail(); // shouldn't happen
exit();
// my legacy API
int foo(int x) { return x+1; }
Now imagine the API is changed to:
// my new API
int foo(int x) { return x+2; }
The application still functions correctly.
How about:
// my new API
int foo(int x) { return TuringMachine(x); }
How are we going to prove that TuringMachine(x) produces a value < 3?
If we can't do this for such tiny programs, how are we going to do it
for ones that we write in practice?
Now, you might be able to limit the set of changes you will consider to
simply "syntactic" operations, such as "move method", "add parameter with initial value", etc.
You'll still need to parse the original program and modified APIs, and check that the syntactic properties imply semantic properties that don't damage the original program. You'll likely need control and dataflow analysis, alias analysis to worry about pointers, etc, and the tool will at best be able to tell for a limited number of cases when no change has occurred.
I'm sure there are research papers on this topic. A quick check at scholar.google.com didn't find anything obvious.

See "Source Compatibility" tab of the report of the ABICC tool. Most of the mentioned API changes and API breaks are detected by this tool.
There are two approaches to use the tool. Original via analysis of header files and a new via analysis of the library debug info ([1]). Use the second one. It's more reliable and simple.
You can find some report examples here: http://abi-laboratory.pro/tracker/
Tutorial: https://sourceware.org/glibc/wiki/Testing/ABI_checker#Usage

Related

Progressive Disclosure in C++ API

Following my reading of the article Programmers Are People Too by Ken Arnold, I have been trying to implement the idea of progressive disclosure in a minimal C++ API, to understand how it could be done at a larger scale.
Progressive disclosure refers to the idea of "splitting" an API into categories that will be disclosed to the user of an API only upon request. For example, an API can be split into two categories: a base category what is (accessible to the user by default) for methods which are often needed and easy to use and a extended category for expert level services.
I have found only one example on the web of such an implementation: the db4o library (in Java), but I do not really understand their strategy. For example, if we take a look at ObjectServer, it is declared as an interface, just like its extended class ExtObjectServer. Then an implementing ObjectServerImpl class, inheriting from both these interfaces is defined and all methods from both interfaces are implemented there.
This supposedly allows code such as:
public void test() throws IOException {
final String user = "hohohi";
final String password = "hohoho";
ObjectServer server = clientServerFixture().server();
server.grantAccess(user, password);
ObjectContainer con = openClient(user, password);
Assert.isNotNull(con);
con.close();
server.ext().revokeAccess(user); // How does this limit the scope to
// expert level methods only since it
// inherits from ObjectServer?
// ...
});
My knowledge of Java is not that good, but it seems my misunderstanding of how this work is at an higher level.
Thanks for your help!
Java and C++ are both statically typed, so what you can do with an object depends not so much on its actual dynamic type, but on the type through which you're accessing it.
In the example you've shown, you'll notice that the variable server is of type ObjectServer. This means that when going through server, you can only access ObjectServer methods. Even if the object happens to be of a type which has other methods (which is the case in your case and its ObjectServerImpl type), you have no way of directly accessing methods other than ObjectServer ones.
To access other methods, you need to get hold of the object through different type. This could be done with a cast, or with an explicit accessor such as your ext(). a.ext() returns a, but as a different type (ExtObjectServer), giving you access to different methods of a.
Your question also asks how is server.ext() limited to expert methods when ExtObjectServer extends ObjectServer. The answer is: it is not, but that is correct. It should not be limited like this. The goal is not to provide only the expert functions. If that was the case, then client code which needs to use both normal and expert functions would need to take two references to the object, just differently typed. There's no advantage to be gained from this.
The goal of progressive disclosure is to hide the expert stuff until it's explicitly requested. Once you ask for it, you've already seen the basic stuff, so why hide it from you?

C++ member variable change listeners (100+ classes)

I am trying to make an architecture for a MMO game and I can't figure out how I can store as many variables as I need in GameObjects without having a lot of calls to send them on a wire at the same time I update them.
What I have now is:
Game::ChangePosition(Vector3 newPos) {
gameobject.ChangePosition(newPos);
SendOnWireNEWPOSITION(gameobject.id, newPos);
}
It makes the code rubbish, hard to maintain, understand, extend. So think of a Champion example:
I would have to make a lot of functions for each variable. And this is just the generalisation for this Champion, I might have have 1-2 other member variable for each Champion type/"class".
It would be perfect if I would be able to have OnPropertyChange from .NET or something similar. The architecture I am trying to guess would work nicely is if I had something similar to:
For HP: when I update it, automatically call SendFloatOnWire("HP", hp);
For Position: when I update it, automatically call SendVector3OnWire("Position", Position)
For Name: when I update it, automatically call SendSOnWire("Name", Name);
What are exactly SendFloatOnWire, SendVector3OnWire, SendSOnWire ? Functions that serialize those types in a char buffer.
OR METHOD 2 (Preffered), but might be expensive
Update Hp, Position normally and then every Network Thread tick scan all GameObject instances on the server for the changed variables and send those.
How would that be implemented on a high scale game server and what are my options? Any useful book for such cases?
Would macros turn out to be useful? I think I was explosed to some source code of something similar and I think it used macros.
Thank you in advance.
EDIT: I think I've found a solution, but I don't know how robust it actually is. I am going to have a go at it and see where I stand afterwards. https://developer.valvesoftware.com/wiki/Networking_Entities
On method 1:
Such an approach could be relatively "easy" to implement using a maps, that are accessed via getters/setters. The general idea would be something like:
class GameCharacter {
map<string, int> myints;
// same for doubles, floats, strings
public:
GameCharacter() {
myints["HP"]=100;
myints["FP"]=50;
}
int getInt(string fld) { return myints[fld]; };
void setInt(string fld, int val) { myints[fld]=val; sendIntOnWire(fld,val); }
};
Online demo
If you prefer to keep the properties in your class, you'd go for a map to pointers or member pointers instead of values. At construction you'd then initialize the map with the relevant pointers. If you decide to change the member variable you should however always go via the setter.
You could even go further and abstract your Champion by making it just a collection of properties and behaviors, that would be accessed via the map. This component architecture is exposed by Mike McShaffry in Game Coding Complete (a must read book for any game developer). There's a community site for the book with some source code to download. You may have a look at the actor.h and actor.cpp file. Nevertheless, I really recommend to read the full explanations in the book.
The advantage of componentization is that you could embed your network forwarding logic in the base class of all properties: this could simplify your code by an order of magnitude.
On method 2:
I think the base idea is perfectly suitable, except that a complete analysis (or worse, transmission) of all objects would be an overkill.
A nice alternative would be have a marker that is set when a change is done and is reset when the change is transmitted. If you transmit marked objects (and perhaps only marked properties of those), you would minimize workload of your synchronization thread, and reduce network overhead by pooling transmission of several changes affecting the same object.
Overall conclusion I arrived at: Having another call after I update the position, is not that bad. It is a line of code longer, but it is better for different motives:
It is explicit. You know exactly what's happening.
You don't slow down the code by making all kinds of hacks to get it working.
You don't use extra memory.
Methods I've tried:
Having maps for each type, as suggest by #Christophe. The major drawback of it was that it wasn't error prone. You could've had HP and Hp declared in the same map and it could've added another layer of problems and frustrations, such as declaring maps for each type and then preceding every variable with the mapname.
Using something SIMILAR to valve's engine: It created a separate class for each networking variable you wanted. Then, it used a template to wrap up the basic types you declared (int, float, bool) and also extended operators for that template. It used way too much memory and extra calls for basic functionality.
Using a data mapper that added pointers for each variable in the constructor, and then sent them with an offset. I left the project prematurely when I realised the code started to be confusing and hard to maintain.
Using a struct that is sent every time something changes, manually. This is easily done by using protobuf. Extending structs is also easy.
Every tick, generate a new struct with the data for the classes and send it. This keeps very important stuff always up to date, but eats a lot of bandwidth.
Use reflection with help from boost. It wasn't a great solution.
After all, I went with using a mix of 4, and 5. And now I am implementing it in my game. One huge advantage of protobuf is the capability of generating structs from a .proto file, while also offering serialisation for the struct for you. It is blazingly fast.
For those special named variables that appear in subclasses, I have another struct made. Alternatively, with help from protobuf I could have an array of properties that are as simple as: ENUM_KEY_BYTE VALUE. Where ENUM_KEY_BYTE is just a byte that references a enum to properties such as IS_FLYING, IS_UP, IS_POISONED, and VALUE is a string.
The most important thing I've learned from this is to have as much serialization as possible. It is better to use more CPU on both ends than to have more Input&Output.
If anyone has any questions, comment and I will do my best helping you out.
ioanb7

Where should the user-defined parameters of a framework be ?

I am kind of a newbie and I am creating a framework to evolve objects in C++ with an evolutionary algorithm.
An evolutionary algorithm evolves objects and tests them to get the best solution (for example, evolve the weights neural network and test it on sample data, so that in the end you get a network which has a good accuracy, without having trained it).
My problem is that there are lots of parameters for the algorithm (type of selection/crossover/mutation, probabilities for each of them...) and since it is a framework, the user should be able to easily access and modify them.
CURRENT SOLUTION
For now, I created a header file parameters.h of this form:
// DON'T CHANGE THESE PARAMETERS
//mutation type
#define FLIP 1
#define ADD_CONNECTION 2
#define RM_CONNECTION 3
// USER DEFINED
static const int TYPE_OF_MUTATION = FLIP;
The user modifies the static variables TYPE_OF_MUTATION and then my mutation function tests what the value of TYPE_OF_MUTATION is and calls the right mutation function.
This works well, but it has a few drawbacks:
when I change a parameter in this header and then call "make", no change is taken into account, I have to call "make clean" then "make". From what I saw, it is not a problem in the makefile but it is how building works. Even if it did re-build when I change a parameter, it would mean re-compile the whole project as these parameters are used everywhere; it is definitely not efficient.
if you want to run the genetic algorithm several times with different parameters, you have to run it a first time then save the results, change the parameters then run it a second time etc.
OTHER POSSIBILITIES
I thought about taking these parameters as arguments of the top-level function. The problem is that the function would then take 20 arguments or so, it doesn't seem really readable...
What I mean about the top-level function is that for now, the evolutionary algorithm is run simply by doing this:
PopulationManager myPop;
myPop.evolveIt();
If I defined the parameters as arguments, we would have something like:
PopulationManager myPop;
myPop.evolveIt(20,10,5,FLIP,9,8,2,3,TOURNAMENT,0,23,4);
You can see how hellish it may be to always define parameters in the right order !
CONCLUSION
The frameworks I know make you build your algorithm yourself from pre-defined functions, but the user shouldn't have to go through all the code to change parameters one by one.
It may be useful to indicate that this framework will be used internally, for a definite set of projects.
Any input about the best way to define these parameters is welcome !
If the options do not change I usually use a struct for this:
enum class MutationType {
Flip,
AddConnection,
RemoveConnection
};
struct Options {
// Documentation for mutation_type.
MutationType mutation_type = MutationType::Flip;
// Documentation for integer option.
int integer_option = 10;
};
And then provide a constructor that takes these options.
Options options;
options.mutation_type = MutationType::AddConnection;
PopulationManager population(options);
C++11 makes this really easy, because it allows specifying defaults for the options, so a user only needs to set the options that need to be different from the default.
Also note that I used an enum for the options, this ensures that the user can only use correct values.
This is a classic example of polymorphism. In your proposed implementation you're doing a switch on constant to decide which polymorphic mutation algorithm you will choose to decide how to mutate the parameter. In C++, the corresponding mechanisms are templates (static polymorphism) or virtual functions (dynamic polymorphism) to select the appropriate mutating algorithm to apply to the parameter.
The templates way has the advantage that everything is resolvable at compile time and the resulting mutating algorithm could be inlined entirely, depending on the implementation. What you give up is the ability to dynamically select parameter mutation algorithms at runtime.
The virtual function way has the advantage that you can defer the choice of mutation algorithm until runtime, allowing this to vary based on input from the user or whatnot. The disadvantage is that the mutation algorithm can no longer be inlined and you pay the cost of a virtual function call (an extra level of indirection) when you mutate the parameter.
If you want to see a real example of how "algorithmic mutation" can work, look at evolve.cpp in my Iterated Dynamics repository on github. This is C code converted to C++ so it is neither using templates nor using virtual functions. Instead it uses function pointers and a switch-on-constant to select the appropriate code. However, the idea is the same.
My recommendation would be to see if you can use static polymorphism (templates) first. From your initial description you were fixing the mutation at compile-time anyway, so you're not giving anything up.
If that was just a prototyping phase and you intended to support switching of mutation algorithms at runtime, then look at virtual functions. As the other answer recommended, please shun C-style coding like #define constants and instead use proper enums.
To solve the "long parameter list smell", the idea of packing all the parameters into a structure is a good one. You can achieve more readability on top of that by using the builder pattern to build up the structure of parameters in a more readable way than just assigning a bunch of values into a struct. In this blog post, I applied the builder pattern to the resource description structures in Direct3D. That allowed me to more directly express these "bags of data" with reasonable defaults and directly reveal my intent to override or replace default values with special values when necessary.

Metaprogramming C/C++ using the preprocessor

So I have this huge tree that is basically a big switch/case with string keys and different function calls on one common object depending on the key and one piece of metadata.
Every entry basically looks like this
} else if ( strcmp(key, "key_string") == 0) {
((class_name*)object)->do_something();
} else if ( ...
where do_something can have different invocations, so I can't just use function pointers. Also, some keys require object to be cast to a subclass.
Now, if I were to code this in a higher level language, I would use a dictionary of lambdas to simplify this.
It occurred to me that I could use macros to simplify this to something like
case_call("key_string", class_name, do_something());
case_call( /* ... */ )
where case_call would be a macro that would expand this code to the first code snippet.
However, I am very much on the fence whether that would be considered good style. I mean, it would reduce typing work and improve the DRYness of the code, but then it really seems to abuse the macro system somewhat.
Would you go down that road, or rather type out the whole thing? And what would be your reasoning for doing so?
Edit
Some clarification:
This code is used as a glue layer between a simplified scripting API which accesses several different aspects of a C++ API as simple key-value properties. The properties are implemented in different ways in C++ though: Some have getter/setter methods, some are set in a special struct. Scripting actions reference C++ objects casted to a common base class. However, some actions are only available on certain subclasses and have to be cast down.
Further down the road, I may change the actual C++ API, but for the moment, it has to be regarded as unchangeable. Also, this has to work on an embedded compiler, so boost or C++11 are (sadly) not available.
I would suggest you slightly reverse the roles. You are saying that the object is already some class that knows how to handle a certain situation, so add a virtual void handle(const char * key) in your base class and let the object check in the implementation if it applies to it and do whatever is necessary.
This would not only eliminate the long if-else-if chain, but would also be more type safe and would give you more flexibility in handling those events.
That seems to me an appropriate use of macros. They are, after all, made for eliding syntactic repetition. However, when you have syntactic repetition, it’s not always the fault of the language—there are probably better design choices out there that would let you avoid this decision altogether.
The general wisdom is to use a table mapping keys to actions:
std::map<std::string, void(Class::*)()> table;
Then look up and invoke the action in one go:
object->*table[key]();
Or use find to check for failure:
const auto i = table.find(key);
if (i != table.end())
object->*(i->second)();
else
throw std::runtime_error(...);
But if as you say there is no common signature for the functions (i.e., you can’t use member function pointers) then what you actually should do depends on the particulars of your project, which I don’t know. It might be that a macro is the only way to elide the repetition you’re seeing, or it might be that there’s a better way of going about it.
Ask yourself: why do my functions take different arguments? Why am I using casts? If you’re dispatching on the type of an object, chances are you need to introduce a common interface.

Putting all code of a module behind 1 interface. Good idea or not?

I have several modules (mainly C) that need to be redesigned (using C++). Currently, the main problems are:
many parts of the application rely on the functions of the module
some parts of the application might want to overrule the behavior of the module
I was thinking about the following approach:
redesign the module so that it has a clear modern class structure (using interfaces, inheritence, STL containers, ...)
writing a global module interface class that can be used to access any functionality of the module
writing an implementation of this interface that simply maps the interface methods to the correct methods of the correct class in the interface
Other modules in the application that currently directly use the C functions of the module, should be passed [an implementation of] this interface. That way, if the application wants to alter the behavior of one of the functions of the module, it simply inherits from this default implementation and overrules any function that it wants.
An example:
Suppose I completely redesign my module so that I have classes like: Book, Page, Cover, Author, ... All these classes have lots of different methods.
I make a global interface, called ILibraryAccessor, with lots of pure virtual methods
I make a default implementation, called DefaultLibraryAccessor, than simply forwards all methods to the correct method of the correct class, e.g.
DefaultLibraryAccessor::printBook(book) calls book->print()
DefaultLibraryAccessor::getPage(book,10) calls book->getPage(10)
DefaultLibraryAccessor::printPage(page) calls page->print()
Suppose my application has 3 kinds of windows
The first one allows all functionality and as an application I want to allow that
The second one also allows all functionality (internally), but from the application I want to prevent printing separate pages
The third one also allows all functionality (internally), but from the application I want to prevent printing certain kinds of books
When constructing the window, the application passes an implementation of ILibraryAccessor to the window
The first window will get the DefaultLibraryAccessor, allowing everything
I will pass a special MyLibraryAccessor to the second window, and in MyLibraryAccessor, I will overrule the printPage method and let it fail
I will pass a special AnotherLibraryAccessor to the third window, and in AnotherLibraryAccessor, I will overrule the printBook method and check the type of book before I will call book->print().
The advantage of this approach is that, as shown in the example, an application can overrule any method it wants to overrule. The disadvantage is that I get a rather big interface, and the class-structure is completely lost for all modules that wants to access this other module.
Good idea or not?
You could represent the class structure with nested interfaces. E.g. instead of DefaultLibraryAccessor::printBook(book), have DefaultLibraryAccessor::Book::print(book). Otherwise it looks like a good design to me.
Maybe look at the design pattern called "Facade". Use one facade per module. Your approach seems good.
ILibraryAccessor sounds like a known anti-pattern, the "god class".
Your individual windows are probably better off inheriting and overriding at Book/Page/Cover/Author level.
The only thing I'd worry about is a loss of granularity, partly addressed by suszterpatt previously. Your implementations might end up being rather heavyweight and inflexible. If you're sure that you can predict the future use of the module at this point then the design is probably ok.
It occurs to me that you might want to keep the interface fine-grained, but find some way of injecting this kind of display-specific behaviour rather than trying to incorporate it at top level.
If you have n number of methods in your interface class, And there are m number of behaviors per each method, you get m*(nC1 + nC2 + nC3 + ... + nCn) Implementations of your interface (I hope I got my math right :) ). Compare this with the m*n implementations you need if you were to have a single interface per function. And this method has added flexibility which is more important. So, no - I don't think a single interface would do. But you don't have to be extreme about it.
EDIT: I am sure the math is wrong. :(