Adding a field to a structure without breaking existing code

Adding a field to a structure without breaking existing code - c++

So I'm working with this huge repository of code and have realized that one of the structs lack an important field. I looked at the code (which uses the struct) as closely as I could and concluded that adding an extra field isn't going to break it.
Any ideas on where I could've screwed up?
Also: design advice is welcome - what's the best way I can accomplish this?
E.g. (if I wasn't clear):
typedef struct foo
{
int a;
int b;
}
foo;
Now it's :
typedef struct foo
{
int a;
int b;
int c;
}
foo;

If that structure is being serialized/deserialized anywhere, be sure to pay attention to that section of the code.
Double check areas of the code where memory is being allocated.

From what you've written above I can't see anything wrong. Two things I can think of:
Whenever you change code and recompile you introduce the ability to find "hidden" bugs. That is, uninitialized pointers which your new data structure could be just big enough to be corrupted.
Are you making sure you initialize c before it gets used?
Follow Up:
Since you haven't found the error yet I'd stop looking at your struct. Someone once wrote look for horses first, zebras second. That is, the error is probably not an exotic one. How much coverage do you have in your unit tests? I'm assuming this is legacy code which almost invariably means 0% or at least that's been my experience. Is this accurate?

If you are using sizeof(struct) to allocate memory at all places and are accessing the members using -> or . operators, I don't think you should face any problem. But, it also depends on where you are trying to add the member, it might screw up your structure alignment if you are not careful.

Any ideas on where I could've screwed up?
Nothing. Everything. It all depends on how, where and why this is used.
Assuming this structure you talk about is a C-style POD and the code is but the simplest, you'll get away with it. But, the moment you are trying something more ambitious, you are dealing with alignment issues (depending on how and where you create objects) and padding at least. If this is C++ and your POD contains custom operators/ctors etc -- you're getting into a lot of trouble. Cross-platform issues may arise, if you rely on the endianness ever etc.

If the code had a robust set of unit tests, it would probably be much easier to track down the problem (you asked for design advice ;) )
I assume you don't need to use the new 'c' variable everywhere in this giant codebase, you're just adding it so you can use it in some code you're adding or modifying? Instead of adding c to foo, you could make a new struct, bar, which contains a foo object and c. Then use bar where it's needed.
As for the actual bug, it could be anything with so little information to go on, but if I had to guess, I'd say someone used a magic number instead of sizeof() somewhere.

Look for memcpy, memset, memcmp. These functions are not member-wise. If they were used using the previous structure length, you may have problems.
Also search the files for every instance of the struct. There may be functions or methods that do not use the new important field. As others have said, if you find the structure in a #define or typedef, you'll have to search those too.

Since you tagged your question C++:
For the future, Pimpl/d-Pointer is a strategy that allows you much greater freedom in extending or re-designing your classes without breaking compatibility.
For example, if you had originally written
// foo.h
class Foo {
public:
Foo();
Foo(const Foo &);
~Foo();
int a() const;
void a(int);
int b() const;
void b(int);
private:
class FooPrivate *const d;
};
// foo.c
class FooPrivate {
public:
FooPrivate() : a(0), b(0) {}
FooPrivate(const FooPrivate &o) : a(o.a), b(o.b) {}
int a;
int b;
};
Foo::Foo() : d(new FooPrivate()) {}
Foo::Foo(const Foo &o) : d(new FooPrivate(*o->d)) {}
Foo::~Foo() { delete d; }
int Foo::a() const { return d->a; }
void Foo::a(int a) { d->a = a; }
// ...
you can easily extend this to
// foo.h
class Foo {
public:
// ...
int a() const;
void a(int);
int b() const;
void b(int);
int c() const;
void c(int);
// ...
};
// foo.c
class FooPrivate {
// ...
int a;
int b;
int c;
};
// ...
without breaking any existing (compiled!) code using Foo.

If the code is used to transfer data across the network, you could be breaking things.

If adding a structure member anywhere other than as the first member breaks anything, then the code has undefined behaviour and it's wrong. So at least you have someone else (or your earlier self) to blame for the breakage. But yes, undefined behaviour includes "happens to do what we'd like it to do", so as the other guys say, watch out for memory allocation, serialization (network and file IO).
As an aside, I always cringe when I see typedef FOO ... struct FOO, as if one is trying to make C code look like C++. I realize I'm in a minority here :)

Its always safe to add new elements at the end of a C struct. Event if that struct is passed to different processes. The code which has been recompiled will see the new struct member and the code which hasn't been will just be aware of the old struct size and just read the old members its knows about.
The caveat here is that new member has to be added at the end of the structure and not in the middle.

Related

How to initialize all fields of a big class with two different fields in standard C++

I have a very big class with a bunch of members, and I want to initialize them with a given specific value.The code below is the most naive implementation, but I don't like it since it's inelegant and hard to maintain because I have to list all the members in the constructor.
struct I_Dont_Like_This_Approach {
int foo;
long bar;
unsigned baz;
int a;
int b;
int c;
int d;
SomeStruct and_so_on;
/*...*/
public:
explicit I_Dont_Like_This_Approach(int i) : foo(i), bar(i), baz(i), a(i), b(i), c(i), d(i), and_so_on(i) /*...*/ {}
};
I thought of an alternative implementation using templates.
template <int N>
struct MyBigClass {
int foo{N};
long bar{N};
unsigned baz{N};
int a{N};
int b{N};
int c{N};
int d{N};
SomeStruct and_so_on{N};
/*...*/
};
but I'm not sure if the code below is safe.
MyBigClass<1> all_one;
MyBigClass<2> all_two;
/* Is the following reinterpret_cast safe? */
all_one = reinterpret_cast<decltype(all_one) &>(all_two);
Does the C++ specification have any guarantees about the data layout compatibility of such templated structs? Or is there a more reasonable implementation? (in standard C++, and don't use macros)

I would argue that the first one is much more maintainable, with the right warnings enabled (and a modern compiler), you will see if your initializer list gets out of sync with the class fields at compile time.
As to your alternative.. you're using templates as compiler arguments, which is not what they're meant to be. That brings a whole slew of issues:
instantiated templates get copied in memory, making your executable larger. Though in this case, I'm hoping your compiler is smart enough to see that the field structure is the same and treat it as one type.
your code now works only with constant literal integers, no more run-time variables.
there is indeed no guarantee that the memory structure of those two classes is the same. You can disable optimizations in most compilers (like pack, alignment, etc), but that comes at the cost of disabling optimizations, which isn't actually necessary except to support your specific code.
And related to the last one, if you ever need to consider whether this is ever going to break, you're heading down a very dark road. I mean any sane person can tell you it will "probably work", but the fact that you have no guarantees in the language that pretty much popularized memory corruption and buffer overflows should terrify you. Write constructors.

C++ : Access a sub-object's methods inside an object

I am starting to code bigger objects, having other objects inside them.
Sometimes, I need to be able to call methods of a sub-object from outside the class of the object containing it, from the main() function for example.
So far I was using getters and setters as I learned.
This would give something like the following code:
class Object {
public:
bool Object::SetSubMode(int mode);
int Object::GetSubMode();
private:
SubObject subObject;
};
class SubObject {
public:
bool SubObject::SetMode(int mode);
int SubObject::GetMode();
private:
int m_mode(0);
};
bool Object::SetSubMode(int mode) { return subObject.SetMode(mode); }
int Object::GetSubMode() { return subObject.GetMode(); }
bool SubObject::SetMode(int mode) { m_mode = mode; return true; }
int SubObject::GetMode() { return m_mode; }
This feels very sub-optimal, forces me to write (ugly) code for every method that needs to be accessible from outside. I would like to be able to do something as simple as Object->SubObject->Method(param);
I thought of a simple solution: putting the sub-object as public in my object.
This way I should be able to simply access its methods from outside.
The problem is that when I learned object oriented programming, I was told that putting anything in public besides methods was blasphemy and I do not want to start taking bad coding habits.
Another solution I came across during my research before posting here is to add a public pointer to the sub-object perhaps?
How can I access a sub-object's methods in a neat way?
Is it allowed / a good practice to put an object inside a class as public to access its methods? How to do without that otherwise?
Thank you very much for your help on this.

The problem with both a pointer and public member object is you've just removed the information hiding. Your code is now more brittle because it all "knows" that you've implemented object Car with 4 object Wheel members. Instead of calling a Car function that hides the details like this:
Car->SetRPM(200); // hiding
You want to directly start spinning the Wheels like this:
Car.wheel_1.SetRPM(200); // not hiding! and brittle!
Car.wheel_2.SetRPM(200);
And what if you change the internals of the class? The above might now be broken and need to be changed to:
Car.wheel[0].SetRPM(200); // not hiding!
Car.wheel[1].SetRPM(200);
Also, for your Car you can say SetRPM() and the class figures out whether it is front wheel drive, rear wheel drive, or all wheel drive. If you talk to the wheel members directly that implementation detail is no longer hidden.
Sometimes you do need direct access to a class's members, but one goal in creating the class was to encapsulate and hide implementation details from the caller.
Note that you can have Set and Get operations that update more than one bit of member data in the class, but ideally those operations make sense for the Car itself and not specific member objects.

I was told that putting anything in public besides methods was blasphemy
Blanket statements like this are dangerous; There are pros and cons to each style that you must take into consideration, but an outright ban on public members is a bad idea IMO.
The main problem with having public members is that it exposes implementation details that might be better hidden. For example, let's say you are writing some library:
struct A {
struct B {
void foo() {...}
};
B b;
};
A a;
a.b.foo();
Now a few years down you decide that you want to change the behavior of A depending on the context; maybe you want to make it run differently in a test environment, maybe you want to load from a different data source, etc.. Heck, maybe you just decide the name of the member b is not descriptive enough. But because b is public, you can't change the behavior of A without breaking client code.
struct A {
struct B {
void foo() {...}
};
struct C {
void foo() {...}
};
B b;
C c;
};
A a;
a.c.foo(); // Uh oh, everywhere that uses b needs to change!
Now if you were to let A wrap the implementation:
class A {
public:
foo() {
if (TESTING) {
b.foo();
} else {
c.foo();
}
private:
struct B {
void foo() {...}
};
struct C {
void foo() {...}
};
B b;
C c;
};
A a;
a.foo(); // I don't care how foo is implemented, it just works
(This is not a perfect example, but you get the idea.)
Of course, the disadvantage here is that it requires a lot of extra boilerplate, like you have already noticed. So basically, the question is "do you expect the implementation details to change in the future, and if so, will it cost more to add boilerplate now, or to refactor every call later?" And if you are writing a library used by external users, then "refactor every call" turns into "break all client code and force them to refactor", which will make a lot of people very upset.
Of course instead of writing forwarding functions for each function in SubObject, you could just add a getter for subObject:
const SubObject& getSubObject() { return subObject; }
// ...
object.getSubObject().setMode(0);
Which suffers from some of the same problems as above, although it is a bit easier to work around because the SubObject interface is not necessarily tied to the implementation.
All that said, I think there are certainly times where public members are the correct choice. For example, simple structs whose primary purpose is to act as the input for another function, or who just get a bundle of data from point A to point B. Sometimes all that boilerplate is really overkill.

C++: Are Structs really the same as Classes? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
What are the differences between struct and class in C++
This question has been asked and answered a lot but every once in a while I come across something confusing.
To sum up, the difference between C++ structs and classes is the famous default public vs. private access. Apart from that, a C++ compiler treats a struct the same way it would treat a class. Structs could have constructors, copy constructors, virtual functions. etc. And the memory layout of a struct is same ad that of a class. And the reason C++ has structs is for backwward compatibility with C.
Now since people confuse as to which one to use, struct or class, the rule of thumb is if you have just plain old data, use a struct. Otherwise use a class. And I have read that structs are good in serialization but don't where this comes from.
Then the other day I came across this article: http://www.codeproject.com/Articles/468882/Introduction-to-a-Cplusplus-low-level-object-model
It says that if we have (directly quoting):
struct SomeStruct
{
int field1;
char field2;
double field3;
bool field4;
};
then this:
void SomeFunction()
{
SomeStruct someStructVariable;
// usage of someStructVariable
...
}
and this:
void SomeFunction()
{
int field1;
char field2;
double field3;
bool field4;
// usage of 4 variables
...
}
are the same.
It says the machine code generated is the same if we have a struct or just write down the variables inside the function. Now of course this only applies if your struct if a POD.
This is where I get confused. In Effective C++ Scott Meyers says that there no such thing as an empty class.
If we have:
class EmptyClass { };
It is actually laid out by the compiler for example as:
class EmptyClass
{
EmptyClass() {}
~EmptyClass() {}
...
};
So you would not have an empty class.
Now if we change the above struct to a class:
class SomeClass
{
int field1;
char field2
double field3;
bool field4;
};
does it mean that:
void SomeFunction()
{
someClass someClassVariable;
// usage of someClassVariable
...
}
and this:
void SomeFunction()
{
int field1;
char field2
double field3;
bool field4;
// usage of 4 variables
...
}
are the same in terms of machine instructions? That there is no call to someClass constructor? Or that the memory allocated is the same as instantiating a class or defining the variables individually? And what about padding? structs and classes do padding. Would padding be the same in these cases?
I'd really appreciate if somebody can shed some light on to this.

I believe the author of that article is mistaken. Although there is probably no difference between the struct and the non-member variable layout version of the two functions, I don't think this is guaranteed. The only things I can think of that are guaranteed here is that since it's a POD, the address of the struct and the first member are the same...and each member follows in memory after that at some point.
In neither case, since it's a POD (and classes can be too, don't make THAT mistake) will the data be initialized.
I would recommend not making such an assumption anyway. If you wrote code that leveraged it, and I can't imagine why you'd want to, most other developers would find it baffling anyway. Only break out the legal books if you HAVE to. Otherwise prefer to code in manners that people are used to. The only important part of all this that you really should keep in mind that POD objects are not initialized unless you do so explicitly.

The only difference is that the members of structs are public by default, while the members of classes are private by default (when I say by default, I mean "unless specified otherwise"). Check out this code:
#include <iostream>
using namespace std;
struct A {
int x;
int y;
};
class A obj1;
int main() {
obj1.x = 0;
obj1.y = 1;
cout << obj1.x << " " << obj1.y << endl;
return 0;
}
The code compiles and runs just fine.

There is no difference between structs and classes besides the default for protection (note that default protection type for base classes is different also). Books and my own 20+ years experience tells this.
Regarding default empty ctor/dector. Standard is not asking for this. Nevertheless some compiler may generate this empty pair of ctor/dector. Every reasonable optimizer would immediately throw them away. If at some place a function that is doing nothing is called, how can you detect this? How this can affect anything besides consuming CPU cycles?
MSVC is not generating useless functions. It is reasonable to think that every good compiler will do the same.
Regarding the examples
struct SomeStruct
{
int field1;
char field2;
double field3;
bool field4;
};
void SomeFunction()
{
int field1;
char field2;
double field3;
bool field4;
...
}
The padding rules, order in memory, etc may be and most likely will be completely different. Optimizer may easily throw away unused local variable. It is much less likely (if possible at all) that optimizer will remove a data field from the struct. For this to happen the struct should be in defined in cpp file, certain flags should be set, etc.
I am not sure you will find any docs about padding of local vars on the stack. AFAIK, this is 100% up to compiler for making this layout. On the contrary, layout of the structs/classes are described, there are #pargma and command line keys that control this, etc.

are the same in terms of machine instructions?
There is no reason not to be. But there is no gurantee from the standard.
That there is no call to someClass constructor?
Yes there is a call to the constructor. But the constructor does no work (as all the members are POD and the way you declare someClass someClassVariable; causes value initialization which does nothing for POD members). So since there is no work to be done there is no need to plant any instructions.
Or that the memory allocated is the same as instantiating a class or defining the variables individually?
The class may contain padding that declaring the variables individually does not.
Also I am sure that the compiler will have an easier time optimizing away individual variables.
And what about padding?
Yes there is a possibility of padding in the structure (struct/class).
structs and classes do padding. Would padding be the same in these cases?
Yes. Just make sure you compare apples to apples (ie)
struct SomeStruct
{
int field1;
char field2;
double field3;
bool field4;
};
class SomeStruct
{
public: /// Make sure you add this line. Now they are identical.
int field1;
char field2;
double field3;
bool field4;
};

Templating (or somehow autoing) the return value of methods

EDIT: To be clear—right off the bat—this is a question about the linguistic abilities of a modern C++ compiler. Not a question about a specific goal. It's hard to describe such an abstract concept without clarifying this first and I've realized that some of the confusion revolves around what is commonly done rather than what can possibly be done. This is a very abstract question. Nothing here will compile and this is on purpose. Likewise, I'm not asking how to make this specific case work, but I'm asking if there's a way to get C++ to recognize what I would like to do (via templating or some kind of auto->decltype trick most likely if even possible).
I'm not exactly new to C++, but certainly not an expert. This is a fundamental problem that I've been struggling with since I've rediscovered the power of the language. The end goal here is to elegantly (and with as little code as possible) forward proper polymorphic return values based on calling context. For example...
class A {
public:
A& foo() {
// do something mutant fooish
return *this;
};
};
class B: public A {
public:
B& bar() {
// do something mutant barish
return *this;
};
};
int main(int argc, char** argv) {
B yarp;
yarp.foo().bar();
};
Compile error. Makes sense, C++ is designed to assume that you know nothing about what you're doing (which makes it highly optimizable but sometimes a pain... a high-level-mid-level OOP language).
Obvioiusly C++ compilers have gotten to the point where they're not only aware of what you are asking for (the A().foo() works and B().foo() works scenario), but also in what context your asking for it in (hence auto yarp = B() in C++11, the compiler knows that yarp is an instance of B). Is there a way to leverage this elegantly without having to reproduce a bunch of "using" statements or wrapped methods (which strangely don't get optimized out according to disassemble of gcc binaries)?
So is there a trick here? Something I simply haven't learned online. An auto -> decltype trick or a templating trick? Example:
class A {
public:
template <typename R>
R& foo() {
// do something fooish
return (R&)*this;
};
};
class B: public A {
public:
using A::foo<A>; // << even this would be better than nothing (but no where near optimum)
B& bar() {
// do something barish
return *this;
};
};
Something even simpler? If you expand this concept to operators of a proxy template class meant for reference counting and gc deallocation, it becomes clear how problematic this becomes. Thanks in advance for any help (oh, and first post on stackoverflow, so if I got any formatting wrong or you have suggestions for a better structured post, apologies around and please point them out).

The obvious solution would be to just seperate it out into two lines:
yarp.foo();
yarp.bar();
or, alternatively, use static_cast's to get back a reference to B&, so
static_cast<B&>(yarp.foo()).bar();
Agreed, that's a little bit more verbose but chaining multiple member-function calls in a heirarchy in one line together like this is pretty unusual syntax for C++. It just doesn't come up a whole lot, so the language doesn't support that idiom terribly well. I have never come across a situation where I ran into this issue yet.
If you want to design some chainable functionality, there are other, better idioms you can use. One example is Boost's Range Adaptors that overload operator| to achieve chaining.
EDIT: Another option is to overload foo() in B&:
class B: public A {
public:
B& foo() { A::foo(); return *this; }
B& bar() {
// do something mutant barish
return *this;
};
};

I don't think there is a auto type detection since compiler even doesn't know what classes will inherit A.
And in your second trial, C++ forbid using a template specialization. So that won't compile.
I think there is another trick you could try is to make A a template
template <typename FinalType>
class A {
public:
FinalType& foo() {
// do something fooish
return static_cast<FinalType&>(*this);
};
};
class B: public A<B> {
public:
B& bar() {
// do something barish
return *this;
};
};

Erm you declare a instance of class B which has no method foo - so no wonder there is a compile error - did you mean
yarp.bar().foo();

Configuration structs vs setters

I recently came across classes that use a configuration object instead of the usual setter methods for configuration. A small example:
class A {
int a, b;
public:
A(const AConfiguration& conf) { a = conf.a; b = conf.b; }
};
struct AConfiguration { int a, b; };
The upsides:
You can extend your object and easily guarantee reasonable default values for new values without your users ever needing to know about it.
You can check a configuration for consistency (e.g. your class only allows some combinations of values)
You save a lot of code by ommiting the setters.
You get a default constructor for specifying a default constructor for your Configuration struct and use A(const AConfiguration& conf = AConfiguration()).
The downside(s):
You need to know the configuration at construction time and can't change it later on.
Are there more downsides to this that I'm missing? If there aren't: Why isn't this used more frequently?

Whether you pass the data individually or per struct is a question of style and needs to be decided on a case-by-case basis.
The important question is this: Is the object is ready and usable after construction and does the compiler enforce that you pass all necessary data to the constructor or do you have to remember to call a bunch of setters after construction who's number might increase at any time without the compiler giving you any hint that you need to adapt your code. So whether this is
A(const AConfiguration& conf) : a(conf.a), b(conf.b) {}
or
A(int a_, int b_) : a(a_), b(b_) {}
doesn't matter all that much. (There's a number of parameters where everyone would prefer the former, but which number this is - and whether such a class is well designed - is debatable.) However, whether I can use the object like this
A a1(Configuration(42,42));
A a2 = Configuration(4711,4711);
A a3(7,7);
or have to do this
A urgh;
urgh.setA(13);
urgh.setB(13);
before I can use the object, does make a huge difference. Especially so, when someone comes along and adds another data field to A.

Using this method makes binary compatibility easier.
When the library version changes and if the configuration struct contains it, then constructor can distinguish whether "old" or "new" configuration is passed and avoid "access violation"/"segfault" when accessing non-existant fields.
Moreover, the mangled name of constructor is retained, which would have changed if it changed its signature. This also lets us retain binary compatibility.
Example:
//version 1
struct AConfiguration { int version; int a; AConfiguration(): version(1) {} };
//version 2
struct AConfiguration { int version; int a, b; AConfiguration(): version(2) {} };
class A {
A(const AConfiguration& conf) {
switch (conf.version){
case 1: a = conf.a; b = 0; // No access violation for old callers!
break;
case 2: a = conf.a; b = conf.b; // New callers do have b member
break;
}
}
};

The main upside is that the A object can be unmutable. I don't know if having the AConfiguration stuct actualy gives any benefit over just an a and a b parameter to the constructor.

Using this method makes binary compatability harder.
If the struct is changed (one new optional field is added), all code using the class might need a recompile. If one new non-virtual setter function is added, no such recompilation is necessary.

I would support the decreased binary compatibility here.
The problem I see comes from the direct access to a struct fields.
struct AConfig1 { int a; int b; };
struct AConfig2 { int a; std::map<int,int> b; }
Since I modified the representation of b, I am screwed, whereas with:
class AConfig1 { public: int getA() const; int getB() const; /* */ };
class AConfig2 { public: int getA() const; int getB(int key = 0) const; /* */ };
The physical layout of the object might have change, but my getters have not and the offset to the functions have not either.
Of course, for binary compatibility, one should check out the PIMPL idiom.
namespace details { class AConfigurationImpl; }
class AConfiguration {
public:
int getA() const;
int getB() const;
private:
AConfigurationImpl* m_impl;
};
While you do end up writing more code, you have the guarantee here of backward compatibility of your object as long as you add supplementary methods AFTER the existing ones.
The representation of an instance in memory does not depend on the number of methods, it only depends on:
the presence or absence of virtual methods
the base classes
the attributes
Which is what is VISIBLE (not what is accessible).
And here we guarantee that we won't have any change in the attributes. The definition of AConfigurationImpl might change without any problem and the implementation of the methods might change too.
The more code means: constructor, copy constructor, assignment operator and destructor, which is a fair amount, and of course the getters and setters. Also note that these methods can no longer be inlined, since their implementation are defined in a source file.
Whether or not it suits you, you're on your own to decide.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Adding a field to a structure without breaking existing code - c++

If that structure is being serialized/deserialized anywhere, be sure to pay attention to that section of the code. Double check areas of the code where memory is being allocated.

If you are using sizeof(struct) to allocate memory at all places and are accessing the members using -> or . operators, I don't think you should face any problem. But, it also depends on where you are trying to add the member, it might screw up your structure alignment if you are not careful.

If the code is used to transfer data across the network, you could be breaking things.

Related

How to initialize all fields of a big class with two different fields in standard C++

C++ : Access a sub-object's methods inside an object

C++: Are Structs really the same as Classes? [duplicate]

Templating (or somehow autoing) the return value of methods

Configuration structs vs setters

Categories

Resources