Finding Implicit Communication in Classes - c++

I am currently refactoring a very useful but poorly designed class in C++, and I'm running into a problem with the design: rather passing data around using arguments to methods, the data is passed around by setting private state variables in the class. This makes it very difficult for me to diagram out how data moves through functions. It's my weekend task to try and remove this style of passing data around as much as possible, as makes the program very impossible to understand from just the method signatures, as the signatures only tell a part of the story. I've decided
My current approach to test if a method communicates using private class-level variables is the following:
Edit the method and make it a function rather than a method, which removes its access to the state variables in the class.
Edit all of the calls to the method so that they call the function rather than the method.
Compile, see if anything breaks. Make a list of accessors to add to the original class.
Run the unit tests to see if I've broken anything in a very subtle way.
Is there a better way of doing this, perhaps one that can be easily automated? Is this refactoring a well-known technique that I can cite if I show it to other people?
The only mention of this problem that I've found so far is this quote from Coders at Work via the Object-oriented programming Wikipedia entry:
"The problem with object-oriented languages is they've got all this implicit environment that they carry around with them. You wanted a banana but what you got was a gorilla holding the banana and the entire jungle." - Joe Armstrong
Edit in response to a good question from Oli Charlesworth:
I understand that the point of OOP is to sometimes communicate through state variables of the class. The difficulty with my current case is that there are currently 78 different data members in the class, many of which are key-value pairs of strings to other data types, and there are undocumented implicit dependencies on the order in which they need to be initialized. It's possible that given a sufficiently smart programmer working with this class would be easy, but it's currently very difficult for me. I think that several of these data types could be abstracted into their own classes, but before I can do that I need to understand more clearly how the data members interact with each other.

Given the clarification in the question my "are you sure it's not just that you don't like the other programmer's style" comment dies a death ;)
Personally I'd just refactor normally. That is, with 78 data members and lots of bits that are related but not in a class of their own I'd start by grouping the related data and extracting the functionality that works on it. There's no need, IMHO, to go through a stage where you explicitly pass the data into the functions in the existing class. Just pick a group of related data items, come up with a decent name, extract them and work out where they were used and how you need to move functionality into the new class.
Ideally, I'd start writing unit tests for the main class and the new broken out classes as I went along...

Instead of making all of the method's callers call the function, a smaller intermediate change would be to leave the method in place for all callers, and have it simply delegate by calling the function. Later you can inline the method call so all callers are directly calling the function.
Also, from your description it sounds like you are approaching this with manual testing. You will have better success (easier refactoring with reduced risk of error) with comprehensive unit tests in place, although of course the code you describe would be hard to unit test. Nevertheless, work toward more test automation.

Related

C++ Singleton Design pattern alternatives

I hate to beat a dead horse, that said, I've gone over so many conflicting articles over the past few days in regards to the use of the singleton pattern.
This question isn't be about which is the better choice in general, rather what makes sense for my use case.
The pet project I'm working on is a game. Some of the code that I'm currently working on, I'm leaning towards using a singleton pattern.
The use cases are as follows:
a globally accessible logger.
an OpenGL rendering manager.
file system access.
network access.
etc.
Now for clarification, more than a couple of the above require shared state between accesses. For instance, the logger is wrapping a logging library and requires a pointer to the output log, the network requires an established open connection, etc.
Now from what I can tell it's more suggested that singletons be avoided, so lets look at how we may do that. A lot of the articles simply say to create the instance at the top and pass it down as a parameter to anywhere that is needed. While I agree that this is technically doable, my question then becomes, how does one manage the potentially massive number of parameters? Well what comes to mind is wrapping the different instances in a sort of "context" object and passing that, then doing something like context->log("Hello World"). Now sure that isn't to bad, but what if you have a sort of framework like so:
game_loop(ctx)
->update_entities(ctx)
->on_preupdate(ctx)
->run_something(ctx)
->only use ctx->log() in some freak edge case in this function.
->on_update(ctx)
->whatever(ctx)
->ctx->networksend(stuff)
->update_physics(ctx)
->ctx->networksend(stuff)
//maybe ctx never uses log here.
You get the point... in some areas, some aspects of the "ctx" aren't ever used but you're still stuck passing it literally everywhere in case you may want to debug something down the line using logger, or maybe later in development, you actually want networking or whatever in that section of code.
I feel like the above example would much rather be suited to a globally accessible singleton, but I must admit, I'm coming from a C#/Java/JS background which may color my view. I want to adopt the mindset/best practices of a C++ programmer, yet like I said, I can't seem to find a straight answer. I also noticed that the articles that suggest just passing the "singleton" as a parameter only give very simplistic use cases that anyone would agree a parameter would be the better way to go.
In this game example, you probably wan't to access logging everywhere even if you don't plan on using it immediately. File system stuff may be all over but until you build out the project, it's really hard to say when/where it will be most useful.
So do I:
Stick with using singletons for these use cases regardless of how "evil/bad" people say it is.
Wrap everything in a context object, and pass it literally everywhere. (seems kinda gross IMO, but if that's the "more accepted/better" way of doing it, so be it.)
Something completely else. (Really lost as to what that might be.)
If option 1, from a performance standpoint, should I switch to using namespace functions, and hiding the "private" variables / functions in anonymous namespaces like most people do in C? (I'm guessing there will be a small boost in performance, but then I'll be stuck having to call an "init" and "destroy" method on a few of these rather than being able to just allow the constructor/destructor to do that for me, still might be worth while?)
Now I realize this may be a bit opinion based, but I'm hoping I can still get a relatively good answer when a more complicated/nested code base is in question.
Edit:
After much more deliberation I've decided to use the "Service Locator" pattern instead. To prevent a global/singleton of the Service Locator I'm making anything that may use the services inherit from a abstract base class that requires the Service Locator be passed when constructed.
I haven't implemented everything yet so I'm still unsure if I'll run into any problems with this approach, and would still love feedback on if this is a reasonable alternative to the singleton / global scope dilemma.
I had read that Service Locator is also somewhat of an anti-pattern, that said, many of the example I found implemented it with statics and/or as a singleton, perhaps using it as I've described removes the aspects that cause it to be an anti-pattern?
Whenever you think you want to use a Singleton, ask yourself the following question: Why is it that it must be ensured at all cost that there never exists more than one instance of this class at any point in time? Because the whole point of the Singleton pattern is to make sure that there can never be more than one instance of the Singleton. That's what the term "singleton" is all about: there only being one. That's why it's called the Singleton pattern. That's why the pattern calls for the constructor to be private. The point of the Singleton pattern is not and never was to give you a globally-accessible instance of something. The fact that there is a global access point to the sole instance is just a consequence of the Singleton pattern. It is not the objective the Singleton pattern is meant to achieve. If all you want is a globally accessible instance of something, then use a global variable. That's exactly what global variables are for…
The Singleton pattern is probably the one design pattern that's singularly more often misunderstood than not. Is it an intrinsic aspect of the very concept of a network connection that there can only ever be one network connection at a time, and the world would come to an end if that constraint was ever to be violated? If the answer is no, then there is no justification for a network connection to ever be modeled as a Singleton. But don't take my word for it, convince yourself by checking out page 127 of Design Patterns: Elements of Reusable Object-Oriented Software where the Singleton pattern was originally described…😉
Concerning your example: If you're ending up having to pass a massive number of parameters into some place then that first and foremost tells you one thing: there are too many responsibilities in that place. This fact is not changed by the use of Singletons. The use of Singletons simply obfuscates this fact because you're not forced to pass all stuff in through one door in the form of parameters but rather just access whatever you want directly all over the place. But you're still accessing these things. So the dependencies of your piece of code are the same. These dependencies are just not expressed explicitly anymore at some interface level but creep around in the mists. And you never know upfront what stuff a certain piece of code depends on until the moment your build breaks after trying to take away one thing that something else happened to depend upon. Note that this issue is not specific to the Singleton pattern. This is a concern with any kind of global entity in general…
So rather than ask the question of how to best pass a massive number of parameters, you should ask the question of why the hell does this one piece of code need access to that many things? For example, do you really need to explicitly pass the network connection to the game loop? Should the game loop not maybe just know the physics world object and that physics world object is at the moment of creation given some object that handles the network communication. And that object in turn is upon initialization told the network connection it is supposed to use? The log could just be a global variable (or is there really anything about the very idea of a log itself that prohibits there ever being more than one log?). Or maybe it would actually make sense for each thread to have its own log (could be a thread-local variable) so that you get a log from each thread in the order of the control flow that thread happened to take rather than some (at best) interleaved mess that would be the output from multiple threads for which you'd probably want to write some tool so that you'd at least have some hope of making sense of it at all…
Concerning performance, consider that, in a game, you'll typically have some parent objects that each manage collections of small child objects. Performance-critical stuff would generally be happening in places where something has to be done to all child objects in such a collection. The relative overhead of first getting to the parent object itself should generally be negligible…
PS: You might wanna have a look at the Entity Component System pattern…

Should I seperate model classes or have them as a single unit?

My game logic model consists of multiple connected classes. There are Board, Cell, Character, etc. Character can be placed (and moved) in Cell (1-1 rel).
There are two approaches:
Make each class of model implement interfaces so that they can be mocked and each class can be tested independently. It forces me to make implementation of each class to not rely on another. But in practice it's hard to avoid Board knowing about Cells too much and Characters knowing how Cell storing mechanism works. I have a Character.Cell and Cell.CurrentCharacter properties. In order for setters to work correctly (not go recursively) they should rely on each others implementation. It feels like the model logic should be considered as a single unit.
Make all public members to return interfaces but use exact classes inside (can involve some downcasting). The cons here are such that I should test the whole model as a single and can't use mocking to test different parts independently. Also there is no sense to use dependency injection inside model, only to get another full model implementation from controller.
So what to do?
UPDATE
You can propose other options.
Why are these the only 2 options?
If you intend to have different versions/types of the classes then interfaces/abstract base classes are a good option to enforce shared behaviour and generalize many operations. However the idea of building the classes independently without knowledge of each other is ridiculous.
It is always a good idea to separate class storage/behaviour to the class/layer it belongs. E.g. no business logic code in the data layer, etc. but the classes need to know about each other in order to function properly. If you make everything independent and based on interfaces you run the risk of over generalizing the application and reducing your efficiency.
Basically if you think you would need to ever downcast the incoming objects to more than one type it's a good idea to look at the design and see if you are gaining anything for the performance loss and nasty casting code you are about to write. If you will be required to handle every type of downcast object you have not gained anything and using polymorphism and a base class is a much better way to go.
Using interfaces does not eliminate your trouble in testing. You will still have to instantiate some version of the objects to test most of the functions on the cell/board anyway. Which for full regression testing will require you test each character's interaction with both.
Don't get me wrong, your character class should most likely have a base class or have an interface. All characters will (I'm sure) share many actions and can benefit from this design. E.g. Moving a character on the board is a fairly generic operation and can be made independent of the character except for a few pieces of information (such as how the character moves, if they are allowed to move, etc.) which should be part of said base class/interface.
When it is reasonable, design classes independently so that they can be tested on their own, but do not use testing as a reason to write bad code. Simple stubs or basic testing instances can be created to help with component testing and takes far less time and effort than fixing unnecessarily complex code.
Interfaces have a purpose, but if you will not be treating 2 classes the same... that is not it.
*Using MVC gives you a leg up on testing as well. If done correctly you should be able to swap out any of the layers to ease your testing of a single layer.

Faking a Method of the Object Under Test

Is there a reason why you shouldn't create a partial fake of an object or just fake one method on the object that you are testing of it for the sake of testing another method? This might be helpful to save you from making an entire new mock object, or when there is an external dependency in the method you are faking which you can't reasonably get rid of and would like to keep out of all the other unit tests?
The objects you want to do this for are trying to do too many things. In particular, if you have an external dependency, you would normally create an object to isolate that dependency. The Façade pattern is one example of this. If your objects weren't designed with testability in mind you may have to do some refactoring. Take a look at Michael Feathers' PDF on working with legacy code(PDF). He also has a book by the same title that goes into much more detail.
It is a very bad idea to mock/fake part of a class to test another.
Doing this, you are not testing what the real code does in the conditions under test leading to unreliable test results.
It also increases the maintenance burden of the faked part of the class. If this is in effect for the whole test program, the fake implementation also makes other tests on the faked method harder.
You need to ask yourself why you need to fake out the part under test.
If it is because the method is accessing a file or database, then you should define an interface and pass an instance of that interface to the class constructor or method. This allows you to test different scenarios in the same test application.
If it is because you are using singletons, you should rethink your design to make it more testable: removing singletons will remove implicit dependencies and maintenance nightmares.
If you are using static methods/free-standing functions to access data in a registry or settings file, you should really move that out of the function under test and pass the data as a parameter or provide a settings provider interface. This will make the code more flexible and robust.
If it is to break a dependency for the purpose of testing (e.g. faking out a vector method to test a method in a matrix class) then you should not be faking that -- you should treat the code under test as what is defined by the class under test by its public interface: methods; pre-conditions, post-conditions, invariants, documentation, parameters and exception specifications.
You can use knowledge of the implementation details to test special edge cases, but trigger those through the main API, not by faking an implementation detail.
For example, suppose you faked std::vector::at() but the implementation switched to use operator[] instead. Your test would break or silently pass.
If the method you want to fake is virtual (as in, not static and not final), then you can subclass your object in your test, override the method in the subclass, and exercise the subclass in the test. No mock-object libraries required.
(Ideally you should consider refactoring, this is not a great long-term solution. But it is a way to get legacy code under test so you can start the refactoring process more easily.)
The Extract and Override technique described in Chapter 3 of Roy Osherove's The Art of Unit Testing does seem to be a way to fake part of the class under test (pp. 71-77). Osherove does not address the concerns raised in some of the other answers to this question.
In addition, Michael Feathers discusses this in Working Effectively with Legacy Code. He terms the resulting class a testing subclass (227) and the technique Subclass and Override Method (401). Now, granted, Feathers is not giving an exposition of pristine techniques that are recommended on new code. But he still gives it serious treatment as a potentially helpful technique.
I also asked my former computer professor about this. He is well-read and currently works full-time in the software industry, where he has advanced rapidly. He said that this technique definitely has a good application, and that there are several dozen classes in the codebase at his company that are under test in this way. He said that, like any technique, it can be overused.
I originally wrote the question when I was new to unit testing and knew next to nothing about dependency injection. Now, after some experience with both, I would add that the need to use this testing technique could be a smell. It may be a sign that need to rework your approach to dependencies. If the method that needs to be faked is one that is inherited from a base class, it may mean that you need to take the adage "favor composition over inheritance" more seriously. You should inject your dependencies rather than inheriting them.
There are some really nice packages for facilitating this kind of stuff. For instance, from the Mockito docs:
//You can mock concrete classes, not only interfaces
LinkedList mockedList = mock(LinkedList.class);
//stubbing
when(mockedList.get(0)).thenReturn("first");
does some real magic that's hard to believe at first. When you call
String firstMember = mockedList.get(0);
you'll get back "first", because of what you said in the "when" statement.

How to update old C code? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Closed 4 years ago.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
I have been working on some 10 year old C code at my job this week, and after implementing a few changes, I went to the boss and asked if he needed anything else done. That's when he dropped the bomb. My next task was to go through the 7000 or so lines and understand more of the code, and to modularize the code somewhat. I asked him how he would like the source code modularized, and he said to start putting the old C code into C++ classes.
Being a good worker, I nodded my head yes, and went back to my desk, where I sit now, wondering how in the world to take this code, and "modularize" it. It's already in 20 source files, each with its own purpose and function. In addition, there are three "main" structs. each of these structures has 30 plus fields, many of them being other, smaller structs. It's a complete mess to try to understand, but almost every single function in the program is passed a pointer to one of the structs and uses the struct heavily.
Is there any clean way for me to shoehorn this into classes? I am resolved to do it if it can be done, I just have no idea how to begin.
First, you are fortunate to have a boss who recognizes that code refactoring can be a long-term cost-saving strategy.
I've done this many times, that is, converting old C code to C++. The benefits may surprise you. The final code may be half the original size when you're done, and much simpler to read. Plus, you will likely uncover tricky C bugs along the way. Here are the steps I would take in your case. Small steps are important because you can't jump from A to Z when refactoring a large body of code. You have to go through small, intermediate steps which may never be deployed, but which can be validated and tagged in whatever RCS you are using.
Create a regression/test suite. You will run the test suite each time you complete a batch of changes to the code. You should have this already, and it will be useful for more than just this refactoring task. Take the time to make it comprehensive. The exercise of creating the test suite will get you familiar with the code.
Branch the project in your revision control system of choice. Armed with a test suite and playground branch, you will be empowered to make large modifications to the code. You won't be afraid to break some eggs.
Make those struct fields private. This step requires very few code changes, but can have a big payoff. Proceed one field at a time. Try to make each field private (yes, or protected), then isolate the code which access that field. The simplest, most non-intrusive conversion would be to make that code a friend function. Consider also making that code a method. Converting the code to be a method is simple, but you will have to convert all of the call sites as well. One is not necessarily better than the other.
Narrow the parameters to each function. It's unlikely that any function requires access to all 30 fields of the struct passed as its argument. Instead of passing the entire struct, pass only the components needed. If a function does in fact seem to require access to many different fields of the struct, then this may be a good candidate to be converted to an instance method.
Const-ify as many variables, parameters, and methods as possible. A lot of old C code fails to use const liberally. Sweeping through from the bottom up (bottom of the call graph, that is), you will add stronger guarantees to the code, and you will be able to identify the mutators from the non-mutators.
Replace pointers with references where sensible. The purpose of this step has nothing to do with being more C++-like just for the sake of being more C++-like. The purpose is to identify parameters that are never NULL and which can never be re-assigned. Think of a reference as a compile-time assertion which says, this is an alias to a valid object and represents the same object throughout the current scope.
Replace char* with std::string. This step should be obvious. You might dramatically reduce the lines of code. Plus, it's fun to replace 10 lines of code with a single line. Sometimes you can eliminate entire functions whose purpose was to perform C string operations that are standard in C++.
Convert C arrays to std::vector or std::array. Again, this step should be obvious. This conversion is much simpler than the conversion from char to std::string because the interfaces of std::vector and std::array are designed to match the C array syntax. One of the benefits is that you can eliminate that extra length variable passed to every function alongside the array.
Convert malloc/free to new/delete. The main purpose of this step is to prepare for future refactoring. Merely changing C code from malloc to new doesn't directly gain you much. This conversion allows you to add constructors and destructors to those structs, and to use built-in C++ automatic memory tools.
Replace localize new/delete operations with the std::auto_ptr family. The purpose of this step is to make your code exception-safe.
Throw exceptions wherever return codes are handled by bubbling them up. If the C code handles errors by checking for special error codes then returning the error code to its caller, and so on, bubbling the error code up the call chain, then that C code is probably a candidate for using exceptions instead. This conversion is actually trivial. Simply throw the return code (C++ allows you to throw any type you want) at the lowest level. Insert a try{} catch(){} statement at the place in the code which handles the error. If no suitable place exists to handle the error, consider wrapping the body of main() in a try{} catch(){} statement and logging it.
Now step back and look how much you've improved the code, without converting anything to classes. (Yes, yes, technically, your structs are classes already.) But you haven't scratched the surface of OO, yet managed to greatly simplify and solidify the original C code.
Should you convert the code to use classes, with polymorphism and an inheritence graph? I say no. The C code probably does not have an overall design which lends itself to an OO model. Notice that the goal of each step above has nothing to do with injecting OO principles into your C code. The goal was to improve the existing code by enforcing as many compile-time constraints as possible, and by eliminating or simplifying the code.
One final step.
Consider adding benchmarks so you can show them to your boss when you're done. Not just performance benchmarks. Compare lines of code, memory usage, number of functions, etc.
Really, 7000 lines of code is not very much. For such a small amount of code a complete rewrite may be in order. But how is this code going to be called? Presumably the callers expect a C API? Or is this not a library?
Anyway, rewrite or not, before you start, make sure you have a suite of tests which you can run easily, with no human intervention, on the existing code. Then with every change you make, run the tests on the new code.
This shoehorning into C++ seems to be arbitrary, ask your boss why he needs that done, figure out if you can meet the same goal less painfully, see if you can prototype a subset in the new less painful way, then go and demo to your boss and recommend that you follow the less painful way.
First, tell your boss you're not continuing until you have:
http://www.amazon.com/Refactoring-Improving-Design-Existing-Code/dp/0201485672
and to a lesser extent:
http://www.amazon.com/Working-Effectively-Legacy-Michael-Feathers/dp/0131177052
Secondly, there is no way of modularising code by shoe-horning it into C++ class. This is a huge task and you need to communicate the complexity of refactoring highly proceedural code to your boss.
It boils down to making a small change (extract method, move method to class etc...) and then testing - there is no short cuts with this.
I do feel your pain though...
I guess that the thinking here is that increasing modularity will isolate pieces of code, such that future changes are facilitated. We have confidence in changing one piece because we know it cannot affect other pieces.
I see two nightmare scenarios:
You have nicely structured C code, it will easily transform to C++ classes. In which case it probably already is pretty darn modular, and you've probably done nothing useful.
It's a rats-nest of interconnected stuff. In which case it's going to be really tough to disentangle it. Increasing modularity would be good, but it's going to be a long hard slog.
However, maybe there's a happy medium. Could there be pieces of logic that important and conceptually isolated but which are currently brittle because of a lack of data-hiding etc. (Yes good C doesn't suffer from this, but we don't have that, otherwise we would leave well alone).
Pulling out a class to own that logic and its data, encpaulating that piece could be useful. Whether it's better to do it wih C or C++ is open to question. (The cynic in me says "I'm a C programmer, great C++ a chance to learn something new!")
So: I'd treat this as an elephant to be eaten. First decide if it should be eaten at all, bad elephent is just no fun, well structured C should be left alone. Second find a suitable first bite. And I'd echo Neil's comments: if you don't have a good automated test suite, you are doomed.
I think a better approach could be totally rewrite the code, but you should ask your boss for what purpose he wants you "to start putting the old C code into c++ classes".
You should ask for more details
Surely it can be done - the question is at what cost? It is a huge task, even for 7K LOC. Your boss must understand that it's gonna take a lot of time, while you can't work on shiny new features etc. If he doesn't fully understand this, and/or is not willing to support you, there is no point starting.
As #David already suggested, the Refactoring book is a must.
From your description it sounds like a large part of the code is already "class methods", where the function gets a pointer to a struct instance and works on that instance. So it could be fairly easily converted into C++ code. Granted, this won't make the code much easier to understand or better modularized, but if this is your boss' prime desire, it can be done.
Note also, that this part of the refactoring is a fairly simple, mechanical process, so it could be done fairly safely without unit tests (with hyperaware editing of course). But for anything more you need unit tests to make sure your changes don't break anything.
It's very unlikely that anything will be gained by this exercise. Good C code is already more modular than C++ typically can be - the use of pointers to structs allows compilation units to be independent in the same was as pImpl does in C++ - in C you don't have to expose the data inside a struct to expose its interface. So if you turn each C function
// Foo.h
typedef struct Foo_s Foo;
int foo_wizz (const Foo* foo, ... );
into a C++ class with
// Foo.hxx
class Foo {
// struct Foo members copied from Foo.c
int wizz (... ) const;
};
you will have reduced the modularity of the system compared with the C code - every client of Foo now needs rebuilding if any private implementation functions or member variables are added to the Foo type.
There are many things classes in C++ do give you, but modularity is not one of them.
Ask your boss what the business goals are being achieved by this exercise.
Note on terminology:
A module in a system is a component with a well defined interface which can be replaced with another module with the same interface without effecting the rest of the system. A system composed of such modules is modular.
For both languages, the interface to a module is by convention a header file. Consider string.h and string as defining the interfaces to simple string processing modules in C and C++. If there is a bug in the implementation of string.h, a new libc.so is installed. This new module has the same interface, and anything dynamically linked to it immediately gets the benefit of the new implementation. Conversely, if there is a bug in string handling in std::string, then every project which uses it needs to be rebuilt. C++ introduces a very large amount of coupling into systems, which the language does nothing to mitigate - in fact, the better uses of C++ which fully exploit its features are often a lot more tightly coupled than the equivalent C code.
If you try and make C++ modular, you typically end up with something like COM, where every object has to have both an interface (a pure virtual base class) and an implementation, and you substitute an indirection for efficient template generated code.
If you don't care about whether your system is composed of replaceable modules, then you don't need to perform actions to to make it modular, and can use some of the features of C++ such as classes and templates which, suitable applied, can improve cohesion within a module. If your project is to produce a single, statically linked application then you don't have a modular system, and you can afford to not care at all about modularity. If you want to create something like anti-grain geometry which is beautiful example of using templates to couple together different algorithms and data structures, then you need to do that in C++ - pretty well nothing else widespread is as powerful.
So be very careful what your manager means by 'modularise'.
If every file already has "its own purpose and function" and "every single function in the program is passed a pointer to one of the structs" then the only difference made in changing it into classes would be to replace the pointer to the struct with the implicit this pointer. That would have no effect on how modularised the system is, in fact (if the struct is only defined in the C file rather than in the header) it will reduce modularity.
With “just” 7000 lines of C code, it will probably be easier to rewrite the code from scratch, without even trying to understand the current code.
And there is no automated way to do or even assist the modularization and refactoring that you envisage.
7000 LOC may sound like much but a lot of this will be boilerplate.
Try and see if you can simplify the code before changing it to c++. Basically though I think he just wants you to convert functions into class methods and convert structs into class data members (if they don't contain function pointers, if they do then convert these to actual methods). Can you get in touch with the original coder(s) of this program? They could help you get some understanding done but mainly I would be searching for that piece of code that is the "engine" of the whole thing and base the new software from there. Also, my boss told me that sometimes it is better to simply rewrite the whole thing, but the existing program is a very good reference to mimic the run time behavior of. Of course specialized algorithms are hard to recode. One thing I can assure you of is that if this code is not the best it could be then you are going to have alot of problems later on. I would go up to your boss and promote the fact that you need to redo from scratch parts of the program. I have just been there and I am really happy my supervisor gave me the ability to rewrite. Now the 2.0 version is light years ahead of the original version.
I read this article which is titled "Make bad code good" from http://www.javaworld.com/javaworld/jw-03-2001/jw-0323-badcode.html?page=7 . Its directed at Java users, but all of its ideas our pretty applicable to your case I think. Though the title makes it sound likes it is only for bad code, I think the article is for maintenance engineers in general.
To summarize Dr. Farrell's ideas, he says:
Start with the easy things.
Fix the comments
Fix the formatting
Follow project conventions
Write automated tests
Break up big files/functions
Rewrite code you don't understand
I think after following everyone else's advice this might be a good article to read when you have some free time.
Good luck!

Separate Class vs Method

Quick design question.
ClassA has a method called DoSomething(args)
In DoSomething(), before it can actually do something, it needs to do some preparatory work with args. I believe this should be encapsulated within ClassA, (as opposed to doing the prep work outside and passing it in) as nothing else needs to know that this prep work is required to DoSomething.
However, it's where the actual preparatory work code belongs that is making me think.
The preparatory work in my particular example is to create a list of items, which satisfy a certain condition, from args.
My hunch is that I should create a new class, ListOfStuff, which takes args in its constructor and put this preparatory work here.
From a TDD-perspective, I think this is the right choice. We can then unit test ListOfStuff til our heart is content. If we had put the preparatory work in a private method of ClassA, we'd have only been able to indirectly test it through testing DoSomething().
But is this overkill? Since adopting the TDD and DI approach, I've seen the number of classes that I write multiply - should I be worried?
Ta.
There are a couple of heuristics here.
Is there state on this class that
survives from invocation to
invocation? Does this prep work get
done every time you need to
doSomething(), or is it done and
saved? If so, that argues for the
class.
Does this computation need to happen
in more than once place? If so,
that argues for a class.
Can the details of the
implementation of the doSomething()
method, or the preparation work for
it, change without affecting the
enclosing class? If so, that argues
for a class.
Well, three heuristics. No one expects the Spanish Inquisition.
What is the simplest thing that could possibly work? That's the TDD mantra. Don't try to think too far ahead. If the time comes to create the helper class, you'll know it because you'll be doing all sorts of related work in multiple methods in other classes. Until then, do the work in your method. If it makes the method too long or cumbersome to read, extract the work to its own method. This method can also be tested to your heart's content without the need for another class.
Definately put it in a new class. Its called separation of concerns. You don't want to overload a class and make it do all sorts of other stuff. This is because your class will not be able to be used anywhere else as its so specific to one thing.
Put it in class, then use that class elsewhere. Otherwise you'd have to write this again and again in the future.
To make it extensible, and be able to pass in all sorts of different algorithms, the design pattern your after here is the Strategy pattern. But that is for the future...
Your object model design should be considered on it's own, rather than in the context of your development strategy. If building ListOFStuff and passing to to DoSomething really is how the object model fits together best, do it, regardless of your development strategy.
I think you've answered your own question a little, however, since ListOfStuff makes it easier to unit test, that probably means it's also a cleaner design.
Hope that helps!
Since adopting the TDD and DI
approach, I've seen the number of
classes that I write multiply - should
I be worried?
If your aim is procedural programming, yes. But since you're almost certainly wanting to work in an OO fashion, no.
Most people use too few types, spend too little time thinking about OO design.
Your questions about class responsibility reflect maturation of thinking (imho).
SoC is valid only if the "concern" in question has nothing to do with the class, or you find classes sharing this "concern" (aka, violating DRY). In your case, it seems that the code is intrinsic to the class - so possibly a private function would be more apt.
As tvanfosson said above, you need to balance SoC with YAGNI. Personally - I think you might be mulling over this prematurely (I know! I do it too all the time).