Messing with encapsulation - c++

I know what exactly the encapsulation means.
But this question was asked to me in an interview.
I have a requirement where in i have to create a new class.
if in a team somebody messes up with the encapsulation part of the class but on a whole
the functionality that is required is working fine.and lets suppose its is delivered to the client.
what are the possible problems that a client might face because of that?
i tried to tell that security norms will be violated and we can use the vulnerability to add something and mess up the product.But he said client doesnt know anything regarding enhancing the code.
I had finally given up.
could anybody please help me with some examples?

Bad encapsulation (whatever it means) makes proper use of the class harder.
For instance, if you have two public methods and they should be called in proper order only and otherwise the object state becomes corrupt that's an example of bad encapsulation - the user can't know from the class definition that those methods should be only called in this order and the class doesn't do anything to protect against calling in wrong order and once the user hasn't guessed the right order he is screwed.

Encapsulation is not a property of the program. It is a property of how the source code of the program is written. As such, someone without access to the source code (such as the end user) will not be affected by proper or improper encapsulation. You could write a program with no encapsulation whatsoever, and if the functionality works fine, the end user would never notice.
Of course, those who do have access to the code are affected by the presence of encapsulation, as it usually makes debugging old code and writing new orthogonal code easier. So, in a sense, this does impact the ability to deliver working functionality on time, but those were assumed, in your interview, to be the case.

Given "client doesnt know anything regarding enhancing the code", I assume they're not writing any code themselves that uses the API. The consequences to them are then only through distributed code that directly modified what should have been private parts of the object... that means more libraries and applications may contain code that also need to be modified should the workings of that class change. So, updates to the software may be larger and possibly harder to deploy. (If properly encapsulating member functions are inline anyway, it may still work out the same as calling code still needs recompilation).
BTW, encapsulation via private and protected isn't a run-time security feature... even at compile time, it's intended as a firm reminder to client code that certain things weren't designed to be directly accessed/modified, but can easily be bypassed even then using reinterpret casts, template specialisations and other hackery.

IMO the biggest problem that you may face is that you will underestimate the cost of refactoring or of implementing new features. For example replacing that class with a new one that works as a proxy to a remote may look simple stuff, but then you may face problems because the application ended up using internals that are hard to change for remote use.
May be taking shortcuts saved a little time but you'll end up giving back this time (and more) for free to angry customers because of this underestimation.
Another serious problem is that a program with a clear interface that however has been poked through doing nasty things with internal stuff just stinks. This is a problem in se, but also a problem because whoever is going to see that code will feel that's ok to do similar things for future changes especially when under time pressure.
This in the long term will change the program in a bowl of rotten segfaulting spaghetti that anyone will fear to touch because of unexplainable ripple side effects. A description of this "broken window" effect is in the nice book The Pragmatic Programmer.

Related

C++ Singleton Design pattern alternatives

I hate to beat a dead horse, that said, I've gone over so many conflicting articles over the past few days in regards to the use of the singleton pattern.
This question isn't be about which is the better choice in general, rather what makes sense for my use case.
The pet project I'm working on is a game. Some of the code that I'm currently working on, I'm leaning towards using a singleton pattern.
The use cases are as follows:
a globally accessible logger.
an OpenGL rendering manager.
file system access.
network access.
etc.
Now for clarification, more than a couple of the above require shared state between accesses. For instance, the logger is wrapping a logging library and requires a pointer to the output log, the network requires an established open connection, etc.
Now from what I can tell it's more suggested that singletons be avoided, so lets look at how we may do that. A lot of the articles simply say to create the instance at the top and pass it down as a parameter to anywhere that is needed. While I agree that this is technically doable, my question then becomes, how does one manage the potentially massive number of parameters? Well what comes to mind is wrapping the different instances in a sort of "context" object and passing that, then doing something like context->log("Hello World"). Now sure that isn't to bad, but what if you have a sort of framework like so:
game_loop(ctx)
->update_entities(ctx)
->on_preupdate(ctx)
->run_something(ctx)
->only use ctx->log() in some freak edge case in this function.
->on_update(ctx)
->whatever(ctx)
->ctx->networksend(stuff)
->update_physics(ctx)
->ctx->networksend(stuff)
//maybe ctx never uses log here.
You get the point... in some areas, some aspects of the "ctx" aren't ever used but you're still stuck passing it literally everywhere in case you may want to debug something down the line using logger, or maybe later in development, you actually want networking or whatever in that section of code.
I feel like the above example would much rather be suited to a globally accessible singleton, but I must admit, I'm coming from a C#/Java/JS background which may color my view. I want to adopt the mindset/best practices of a C++ programmer, yet like I said, I can't seem to find a straight answer. I also noticed that the articles that suggest just passing the "singleton" as a parameter only give very simplistic use cases that anyone would agree a parameter would be the better way to go.
In this game example, you probably wan't to access logging everywhere even if you don't plan on using it immediately. File system stuff may be all over but until you build out the project, it's really hard to say when/where it will be most useful.
So do I:
Stick with using singletons for these use cases regardless of how "evil/bad" people say it is.
Wrap everything in a context object, and pass it literally everywhere. (seems kinda gross IMO, but if that's the "more accepted/better" way of doing it, so be it.)
Something completely else. (Really lost as to what that might be.)
If option 1, from a performance standpoint, should I switch to using namespace functions, and hiding the "private" variables / functions in anonymous namespaces like most people do in C? (I'm guessing there will be a small boost in performance, but then I'll be stuck having to call an "init" and "destroy" method on a few of these rather than being able to just allow the constructor/destructor to do that for me, still might be worth while?)
Now I realize this may be a bit opinion based, but I'm hoping I can still get a relatively good answer when a more complicated/nested code base is in question.
Edit:
After much more deliberation I've decided to use the "Service Locator" pattern instead. To prevent a global/singleton of the Service Locator I'm making anything that may use the services inherit from a abstract base class that requires the Service Locator be passed when constructed.
I haven't implemented everything yet so I'm still unsure if I'll run into any problems with this approach, and would still love feedback on if this is a reasonable alternative to the singleton / global scope dilemma.
I had read that Service Locator is also somewhat of an anti-pattern, that said, many of the example I found implemented it with statics and/or as a singleton, perhaps using it as I've described removes the aspects that cause it to be an anti-pattern?
Whenever you think you want to use a Singleton, ask yourself the following question: Why is it that it must be ensured at all cost that there never exists more than one instance of this class at any point in time? Because the whole point of the Singleton pattern is to make sure that there can never be more than one instance of the Singleton. That's what the term "singleton" is all about: there only being one. That's why it's called the Singleton pattern. That's why the pattern calls for the constructor to be private. The point of the Singleton pattern is not and never was to give you a globally-accessible instance of something. The fact that there is a global access point to the sole instance is just a consequence of the Singleton pattern. It is not the objective the Singleton pattern is meant to achieve. If all you want is a globally accessible instance of something, then use a global variable. That's exactly what global variables are for…
The Singleton pattern is probably the one design pattern that's singularly more often misunderstood than not. Is it an intrinsic aspect of the very concept of a network connection that there can only ever be one network connection at a time, and the world would come to an end if that constraint was ever to be violated? If the answer is no, then there is no justification for a network connection to ever be modeled as a Singleton. But don't take my word for it, convince yourself by checking out page 127 of Design Patterns: Elements of Reusable Object-Oriented Software where the Singleton pattern was originally described…😉
Concerning your example: If you're ending up having to pass a massive number of parameters into some place then that first and foremost tells you one thing: there are too many responsibilities in that place. This fact is not changed by the use of Singletons. The use of Singletons simply obfuscates this fact because you're not forced to pass all stuff in through one door in the form of parameters but rather just access whatever you want directly all over the place. But you're still accessing these things. So the dependencies of your piece of code are the same. These dependencies are just not expressed explicitly anymore at some interface level but creep around in the mists. And you never know upfront what stuff a certain piece of code depends on until the moment your build breaks after trying to take away one thing that something else happened to depend upon. Note that this issue is not specific to the Singleton pattern. This is a concern with any kind of global entity in general…
So rather than ask the question of how to best pass a massive number of parameters, you should ask the question of why the hell does this one piece of code need access to that many things? For example, do you really need to explicitly pass the network connection to the game loop? Should the game loop not maybe just know the physics world object and that physics world object is at the moment of creation given some object that handles the network communication. And that object in turn is upon initialization told the network connection it is supposed to use? The log could just be a global variable (or is there really anything about the very idea of a log itself that prohibits there ever being more than one log?). Or maybe it would actually make sense for each thread to have its own log (could be a thread-local variable) so that you get a log from each thread in the order of the control flow that thread happened to take rather than some (at best) interleaved mess that would be the output from multiple threads for which you'd probably want to write some tool so that you'd at least have some hope of making sense of it at all…
Concerning performance, consider that, in a game, you'll typically have some parent objects that each manage collections of small child objects. Performance-critical stuff would generally be happening in places where something has to be done to all child objects in such a collection. The relative overhead of first getting to the parent object itself should generally be negligible…
PS: You might wanna have a look at the Entity Component System pattern…

Removing dependencies from statechart framework

I've got lots of problems with project i am currently working on. The project is more than 10 years old and it was based on one of those commercial C++ frameworks which were very populary in the 90's. The problem is with statecharts. The framework provides quite common implementation of state pattern. Each state is a separate class, with action on entry, action in state etc. There is a switch which sets current state according to received events.
Devil is hidden in details. That project is enormous. It's something about 2000 KLOC. There is definitely too much statecharts (i've seen "for" loops implemented using statecharts). What's more ... framework allows to embed statechart in another statechart so there are many statecherts with seven or even more levels of nesting. Because statecharts run in different threads, and it's possible to send events between statecharts we have lots of synchronization problems (and big mess in interfaces).
I must admit that scale of this problem is overwhelming and I don't know how to touch it. My first idea was to remove as much code as I can from statecharts and put it into separate classes. Then delegate these classes from statechart to do a job. But in result we will have many separate functions, which logically don't have any specific functionality and any change in statechart architecture will need also a change of that classes and functions.
So I asking for help:
Do you know any books/articles/magic artefacts which can help me to fix this ? I would like to at least separate as much code as I can from statechart without introducing any hidden dependencies and keep separated code maintainable, testable and reusable.
If you have any suggestion how to handle this, please let me know.
The statechart pattern is intended to be used specifically to remove switch statements, so this sounds like a horrid abuse. Additionally, states should only change on asynchronous events. If you are processing an event and you change through multiple states (or for loop, etc.), then this is also a horrid abuse of the pattern.
I would start from these two points, as they will solve much of your concurrency issues just fixing them up. What you need to determine is:
What are your external, asynchronous events to the system? These are the only things that should be determining state transitions, not things that happen during event processing. An event may cause 0 or 1 state transitions. Once you have a list of these state transitions, you can reconstruct the actual states of your system. If you are aware of UML State diagrams, this would be a perfect time to sketch one up in a charting program, not just for yourself (though it will help you immensely), but also for everyone in the future that has to return to the project. As you have learned, this happens.
Now that you know what are really states, list what are states in the code that shouldn't be. This usually indicates that something can be "functionally decomposed". Instead of a state object for each of these, likely all that is needed is a separate function. This will cut down on a lot of the overhead of state objects and should clean up the code immensely.
Now it's time to tackle those horrendous switch statements you mentioned. If they were truly based on state, you shouldn't need one at all. Instead, you should be able to call the state machine directly.
Something like:
myStateMachine->myEvent();
and it should work without any switch. But notice, this may be the case even for some of those objects that don't work across asynchronous events. This is also an indication of where you may just use inheritance to get the same effect. If you have:
switch (someTypeIdentifier)
{
case type1:
doSomething();
break;
case type2:
doSomethingElse();
break;
}
usually the correct OOP method to do is to create two actual types Type1, Type2, both derived from an abstract base TypeBase, with a virtual method doSomething() that does what you need. The reason this is useful is because it means you can "close" the handling (in the meaning of the Open/Closed Principle), and still extend the functionality by adding new derived types as needed (leaving it open to extension). This saves bugs like crazy because it gets developers hands out of those switch statements, which can get quite ugly and convoluted, instead encapsulating each separate behavior in separate classes.
4 - Now look to fix up your thread issues. Identify all objects used from multiple threads. Make a list. Now, how are these used? Are some of them always used together? Start making groups. The goal here is to find the level of encapsulation that best works for these objects, separate the objects into individual classes that control their own synchronisation, figure out the atomic level of actual "transactions" for the objects, and make methods of the classes that expose those meaningful transactions, wrapped behind the scenes with the appropriate mutexes, condition variables, etc.
You might be saying "that sounds like a lot of work! Why do all that instead of just writing it all over myself?" Good question! :) The reason is actually straightforward: if you are going to do it all by yourself, those are the steps you should be doing anyway. You should be identifying your states, your dynamic polymorphism, and getting a handle on the multithreaded transactions. But, if you start with the existing code, you also have all of those unspoken business rules that were never documented and may cause all sorts of unexpected bugs down the line. You don't have to bring everything over - if you suspect it's a bug, discuss the logic with the people who have worked with the system in the past (if available), QA, or whoever might identify bugs, and see if it really should be carried over. But you need to actually evaluate what the bugs are either way, or you may not code something that actually needed coding.
In the end, this is a manual process that is a part of software engineering. There are CASE tools that can help draw up the state diagrams and even publish them to code, there are refactoring tools, like those found in many IDEs, that can help move code between functions and classes, and similar tools which can help identify threading needs. However, those things shouldn't be picked up for a single project. They need to be learned throughout your career, picking them up and learning them more deeply over years of work, as they are a part of being a software engineer. They don't do it for you. You still need to know the whys and hows, and they just help get it done more efficiently.
Statecharts (including nested Statecharts) are a powerful way to specify, understand and even simulate/validate complex control flow. But to gain the benefit, you need the statechart model in a suitable tool (I used Statemate way back in the day, not sure if it's still available), plus a reliable mapping from the chart to the code (Statemate used to generate the code) - then you can forget about the state management code (mostly)! In your situation, if you don't have the model, I would try to reverse one from the code - as Ira says, chances are high that the original developers had a model in some form, and you may find the code making a lot of sense as the model emerges. If this works out, you will have a really good spec/model of the code which should make future code edits much easier (even if you don't want to go to automatic code generation, and maintain the code/model mapping manually (but you'll need to be meticulous!!))
Sounds to me like your best bet is (gulp!) likely to start from scratch if it's as horrifically broken as you make out. Is there any documentation? Could you begin to build some saner software based on the docs?
If a complete re-write isn't an option (and they never are in my experience) I'd try some of the following:
If you don't already have it, draw an architectural picture of the whole system. Sketch out how all the bits are supposed to work together and that will help you break the system down into potentially manageable / testable parts.
Do you have any kind of requirements or testing plan in place? If not, can you write one and start to put unit tests in place for the various chunks of code / functionality which exist already? If you can do that, you can start to refactor things without breaking as much of whatever does currently work.
Once you've broken things down a bit, start building your unit tests into integration tests which pull together more of the functionality.
I've not read them myself, but I've heard good things about these books which may have some advice you can use:
Refactoring: Improving the Design of Existing Code (Object Technology Series).
Working Effectively with Legacy Code (Robert C. Martin)
Good luck! :-)

How simple is 'too simple to break'? - explained

In JUnit FAQ you can read that you shouldn't test methods that are too simple to break. While all examples seem logical (getters and setters, delegation etc.), I'm not sure I am able to grasp the "can't break on its own" concept in full. When would you say that the method "can't break on its own"? Anyone care to elaborate?
I think "can't break on its own" means that the method only uses elements of its own class, and does not depend upon the behavior of any other objects/classes, or that it delegates all of its functionality to some other method or class (which presumably has its own tests).
The basic idea is that if you can see everything the method does, without needing to refer to other methods or classes, and you are pretty sure it is correct, then a test is probably not necessary.
There is not necessarily a clear line here. "Too simple to break" is in the eye of the beholder.
Try thinking of it this way. You're not really testing methods. You're describing some behaviour and giving some examples of how to use it, so that other people (including your later self) can come and change that behaviour safely later. The examples happen to be executable.
If you think that people can change your code safely, you don't need to worry.
No matter how simple a method is, it can still be wrong. For example you might have two similarly named variables and access the wrong one. However, these errors will likely be quickly found and once these methods are written correctly, they are going to stay correct and so it is not worthwhile permanently keeping around a test for this. Rather than "too simple to break", I would recommend considering whether it is too simple to be worth keeping a permanent test.
Put it this way, you're building a wood table.
You'll test things that may fail. For instance, putting a jar in the table, or sitting over the table, or pushing the table from one side to another in the room, etc. You're testing table, in a way you know it is somehow vulnerable or at least in a way you know you'll use it.
You don't test though, nails, or one of its legs, because they are "too simple to break on its own".
Same goes for unit testing, you don't test getters/setters, because the only way they may fail, it because the runtime environment fail. You don't test methods that forward the message to other methods, because they are too simple to break on it own, you better test the referenced method.
I hope this helps.
If you are using TDD, that advice is wrong. In the TDD case your function exists because the test failed. It doesn't matter if the function is simple or not.
But if you are adding tests afterwards to some existing code, I can sort of understand the argument that you shouldn't need to test code that cannot break. But I still think that is just an excuse for not having to write tests. Also ask yourself: if that piece of code is not worthy of testing, then maybe that code is not needed at all?
I like the risk based approach of GAMP 5 which basically means (in this context) to first asses the various possible risks of a software and only define tests for the higher-risk parts.
Although this applies to GxP environments, it can be adapted in the way: How likely is a certain class to have erroneous methods, and how big is the impact an error would have? E.g., if a method decides whether to give a user access to a resource, you must of course test that extensively enough.
That means in deterimining where to draw the line between "too simple to break" it can help to take into consideration the possible consequences of a potential flaw.

How to update old C code? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Closed 4 years ago.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
I have been working on some 10 year old C code at my job this week, and after implementing a few changes, I went to the boss and asked if he needed anything else done. That's when he dropped the bomb. My next task was to go through the 7000 or so lines and understand more of the code, and to modularize the code somewhat. I asked him how he would like the source code modularized, and he said to start putting the old C code into C++ classes.
Being a good worker, I nodded my head yes, and went back to my desk, where I sit now, wondering how in the world to take this code, and "modularize" it. It's already in 20 source files, each with its own purpose and function. In addition, there are three "main" structs. each of these structures has 30 plus fields, many of them being other, smaller structs. It's a complete mess to try to understand, but almost every single function in the program is passed a pointer to one of the structs and uses the struct heavily.
Is there any clean way for me to shoehorn this into classes? I am resolved to do it if it can be done, I just have no idea how to begin.
First, you are fortunate to have a boss who recognizes that code refactoring can be a long-term cost-saving strategy.
I've done this many times, that is, converting old C code to C++. The benefits may surprise you. The final code may be half the original size when you're done, and much simpler to read. Plus, you will likely uncover tricky C bugs along the way. Here are the steps I would take in your case. Small steps are important because you can't jump from A to Z when refactoring a large body of code. You have to go through small, intermediate steps which may never be deployed, but which can be validated and tagged in whatever RCS you are using.
Create a regression/test suite. You will run the test suite each time you complete a batch of changes to the code. You should have this already, and it will be useful for more than just this refactoring task. Take the time to make it comprehensive. The exercise of creating the test suite will get you familiar with the code.
Branch the project in your revision control system of choice. Armed with a test suite and playground branch, you will be empowered to make large modifications to the code. You won't be afraid to break some eggs.
Make those struct fields private. This step requires very few code changes, but can have a big payoff. Proceed one field at a time. Try to make each field private (yes, or protected), then isolate the code which access that field. The simplest, most non-intrusive conversion would be to make that code a friend function. Consider also making that code a method. Converting the code to be a method is simple, but you will have to convert all of the call sites as well. One is not necessarily better than the other.
Narrow the parameters to each function. It's unlikely that any function requires access to all 30 fields of the struct passed as its argument. Instead of passing the entire struct, pass only the components needed. If a function does in fact seem to require access to many different fields of the struct, then this may be a good candidate to be converted to an instance method.
Const-ify as many variables, parameters, and methods as possible. A lot of old C code fails to use const liberally. Sweeping through from the bottom up (bottom of the call graph, that is), you will add stronger guarantees to the code, and you will be able to identify the mutators from the non-mutators.
Replace pointers with references where sensible. The purpose of this step has nothing to do with being more C++-like just for the sake of being more C++-like. The purpose is to identify parameters that are never NULL and which can never be re-assigned. Think of a reference as a compile-time assertion which says, this is an alias to a valid object and represents the same object throughout the current scope.
Replace char* with std::string. This step should be obvious. You might dramatically reduce the lines of code. Plus, it's fun to replace 10 lines of code with a single line. Sometimes you can eliminate entire functions whose purpose was to perform C string operations that are standard in C++.
Convert C arrays to std::vector or std::array. Again, this step should be obvious. This conversion is much simpler than the conversion from char to std::string because the interfaces of std::vector and std::array are designed to match the C array syntax. One of the benefits is that you can eliminate that extra length variable passed to every function alongside the array.
Convert malloc/free to new/delete. The main purpose of this step is to prepare for future refactoring. Merely changing C code from malloc to new doesn't directly gain you much. This conversion allows you to add constructors and destructors to those structs, and to use built-in C++ automatic memory tools.
Replace localize new/delete operations with the std::auto_ptr family. The purpose of this step is to make your code exception-safe.
Throw exceptions wherever return codes are handled by bubbling them up. If the C code handles errors by checking for special error codes then returning the error code to its caller, and so on, bubbling the error code up the call chain, then that C code is probably a candidate for using exceptions instead. This conversion is actually trivial. Simply throw the return code (C++ allows you to throw any type you want) at the lowest level. Insert a try{} catch(){} statement at the place in the code which handles the error. If no suitable place exists to handle the error, consider wrapping the body of main() in a try{} catch(){} statement and logging it.
Now step back and look how much you've improved the code, without converting anything to classes. (Yes, yes, technically, your structs are classes already.) But you haven't scratched the surface of OO, yet managed to greatly simplify and solidify the original C code.
Should you convert the code to use classes, with polymorphism and an inheritence graph? I say no. The C code probably does not have an overall design which lends itself to an OO model. Notice that the goal of each step above has nothing to do with injecting OO principles into your C code. The goal was to improve the existing code by enforcing as many compile-time constraints as possible, and by eliminating or simplifying the code.
One final step.
Consider adding benchmarks so you can show them to your boss when you're done. Not just performance benchmarks. Compare lines of code, memory usage, number of functions, etc.
Really, 7000 lines of code is not very much. For such a small amount of code a complete rewrite may be in order. But how is this code going to be called? Presumably the callers expect a C API? Or is this not a library?
Anyway, rewrite or not, before you start, make sure you have a suite of tests which you can run easily, with no human intervention, on the existing code. Then with every change you make, run the tests on the new code.
This shoehorning into C++ seems to be arbitrary, ask your boss why he needs that done, figure out if you can meet the same goal less painfully, see if you can prototype a subset in the new less painful way, then go and demo to your boss and recommend that you follow the less painful way.
First, tell your boss you're not continuing until you have:
http://www.amazon.com/Refactoring-Improving-Design-Existing-Code/dp/0201485672
and to a lesser extent:
http://www.amazon.com/Working-Effectively-Legacy-Michael-Feathers/dp/0131177052
Secondly, there is no way of modularising code by shoe-horning it into C++ class. This is a huge task and you need to communicate the complexity of refactoring highly proceedural code to your boss.
It boils down to making a small change (extract method, move method to class etc...) and then testing - there is no short cuts with this.
I do feel your pain though...
I guess that the thinking here is that increasing modularity will isolate pieces of code, such that future changes are facilitated. We have confidence in changing one piece because we know it cannot affect other pieces.
I see two nightmare scenarios:
You have nicely structured C code, it will easily transform to C++ classes. In which case it probably already is pretty darn modular, and you've probably done nothing useful.
It's a rats-nest of interconnected stuff. In which case it's going to be really tough to disentangle it. Increasing modularity would be good, but it's going to be a long hard slog.
However, maybe there's a happy medium. Could there be pieces of logic that important and conceptually isolated but which are currently brittle because of a lack of data-hiding etc. (Yes good C doesn't suffer from this, but we don't have that, otherwise we would leave well alone).
Pulling out a class to own that logic and its data, encpaulating that piece could be useful. Whether it's better to do it wih C or C++ is open to question. (The cynic in me says "I'm a C programmer, great C++ a chance to learn something new!")
So: I'd treat this as an elephant to be eaten. First decide if it should be eaten at all, bad elephent is just no fun, well structured C should be left alone. Second find a suitable first bite. And I'd echo Neil's comments: if you don't have a good automated test suite, you are doomed.
I think a better approach could be totally rewrite the code, but you should ask your boss for what purpose he wants you "to start putting the old C code into c++ classes".
You should ask for more details
Surely it can be done - the question is at what cost? It is a huge task, even for 7K LOC. Your boss must understand that it's gonna take a lot of time, while you can't work on shiny new features etc. If he doesn't fully understand this, and/or is not willing to support you, there is no point starting.
As #David already suggested, the Refactoring book is a must.
From your description it sounds like a large part of the code is already "class methods", where the function gets a pointer to a struct instance and works on that instance. So it could be fairly easily converted into C++ code. Granted, this won't make the code much easier to understand or better modularized, but if this is your boss' prime desire, it can be done.
Note also, that this part of the refactoring is a fairly simple, mechanical process, so it could be done fairly safely without unit tests (with hyperaware editing of course). But for anything more you need unit tests to make sure your changes don't break anything.
It's very unlikely that anything will be gained by this exercise. Good C code is already more modular than C++ typically can be - the use of pointers to structs allows compilation units to be independent in the same was as pImpl does in C++ - in C you don't have to expose the data inside a struct to expose its interface. So if you turn each C function
// Foo.h
typedef struct Foo_s Foo;
int foo_wizz (const Foo* foo, ... );
into a C++ class with
// Foo.hxx
class Foo {
// struct Foo members copied from Foo.c
int wizz (... ) const;
};
you will have reduced the modularity of the system compared with the C code - every client of Foo now needs rebuilding if any private implementation functions or member variables are added to the Foo type.
There are many things classes in C++ do give you, but modularity is not one of them.
Ask your boss what the business goals are being achieved by this exercise.
Note on terminology:
A module in a system is a component with a well defined interface which can be replaced with another module with the same interface without effecting the rest of the system. A system composed of such modules is modular.
For both languages, the interface to a module is by convention a header file. Consider string.h and string as defining the interfaces to simple string processing modules in C and C++. If there is a bug in the implementation of string.h, a new libc.so is installed. This new module has the same interface, and anything dynamically linked to it immediately gets the benefit of the new implementation. Conversely, if there is a bug in string handling in std::string, then every project which uses it needs to be rebuilt. C++ introduces a very large amount of coupling into systems, which the language does nothing to mitigate - in fact, the better uses of C++ which fully exploit its features are often a lot more tightly coupled than the equivalent C code.
If you try and make C++ modular, you typically end up with something like COM, where every object has to have both an interface (a pure virtual base class) and an implementation, and you substitute an indirection for efficient template generated code.
If you don't care about whether your system is composed of replaceable modules, then you don't need to perform actions to to make it modular, and can use some of the features of C++ such as classes and templates which, suitable applied, can improve cohesion within a module. If your project is to produce a single, statically linked application then you don't have a modular system, and you can afford to not care at all about modularity. If you want to create something like anti-grain geometry which is beautiful example of using templates to couple together different algorithms and data structures, then you need to do that in C++ - pretty well nothing else widespread is as powerful.
So be very careful what your manager means by 'modularise'.
If every file already has "its own purpose and function" and "every single function in the program is passed a pointer to one of the structs" then the only difference made in changing it into classes would be to replace the pointer to the struct with the implicit this pointer. That would have no effect on how modularised the system is, in fact (if the struct is only defined in the C file rather than in the header) it will reduce modularity.
With “just” 7000 lines of C code, it will probably be easier to rewrite the code from scratch, without even trying to understand the current code.
And there is no automated way to do or even assist the modularization and refactoring that you envisage.
7000 LOC may sound like much but a lot of this will be boilerplate.
Try and see if you can simplify the code before changing it to c++. Basically though I think he just wants you to convert functions into class methods and convert structs into class data members (if they don't contain function pointers, if they do then convert these to actual methods). Can you get in touch with the original coder(s) of this program? They could help you get some understanding done but mainly I would be searching for that piece of code that is the "engine" of the whole thing and base the new software from there. Also, my boss told me that sometimes it is better to simply rewrite the whole thing, but the existing program is a very good reference to mimic the run time behavior of. Of course specialized algorithms are hard to recode. One thing I can assure you of is that if this code is not the best it could be then you are going to have alot of problems later on. I would go up to your boss and promote the fact that you need to redo from scratch parts of the program. I have just been there and I am really happy my supervisor gave me the ability to rewrite. Now the 2.0 version is light years ahead of the original version.
I read this article which is titled "Make bad code good" from http://www.javaworld.com/javaworld/jw-03-2001/jw-0323-badcode.html?page=7 . Its directed at Java users, but all of its ideas our pretty applicable to your case I think. Though the title makes it sound likes it is only for bad code, I think the article is for maintenance engineers in general.
To summarize Dr. Farrell's ideas, he says:
Start with the easy things.
Fix the comments
Fix the formatting
Follow project conventions
Write automated tests
Break up big files/functions
Rewrite code you don't understand
I think after following everyone else's advice this might be a good article to read when you have some free time.
Good luck!

How specific to get on design document?

I'm creating a design document for a security subsystem, to be written in C++. I've created a class diagram and sequence diagrams for the major use cases. I've also specified the public attributes, associations and methods for each of the classes. But, I haven't drilled the method definitions down to the C++ level yet. Since I'm new to C++ , as is the other developer, I wonder if it might not make sense go ahead and specify to this level. Thoughts?
edit: Wow - completely against, unanimous. I was thinking about, for example, the whole business about specifying const vs. non-const, passing references, handling default constructor and assigns, and so forth. I do believe it's been quite helpful to spec this out to this level of detail so far. I definitely have gotten a clearer idea of how the system will work. Maybe if I just do a few methods, as an example, before diving into the code?
I wouldn't recommend going to this level, but then again you've already gone past where I would go in a design specification. My personal feeling is that putting a lot of effort into detailed design up-front is going to be wasted as you find out in developing code that your guesses as to how the code will work are wrong. I would stick with a high-level design and think about using TDD (test driven development) to guide the low-level design and implementation.
I would say it makes no sense at all, and that you have gone too far already. If you are new to C++ you are in no position to write a detailed design document for a C++ project. I would recommend you try to implement what you already have in C++, learn by the inevitable mistakes (like public attributes) and then go back and revise it.
Since you're new, it probably makes sense not to drill down.
Reason: You're still figuring out the language and how things are best structured. That means you'll make mistakes initially and you'll want to correct them without constantly updating the documentation.
It really depends on who the design document is targeted at. If it's for a boss who is non-technical, then you are good with what you have.
If it's for yourself, then you are using the tool to help you, so you decide. I create method level design docs when I am creating a project, but it's at a high level so I can figure out what the features of the various classes should be. I've found that across languages, the primary functionalities of a class have little to do with the programming language we are working in. Some of the internal details and functions required certainly vary due to the chosen language, but those are implementation details that I don't bother with during the design phase.
It certainly helps me to know that for instance an authorization class might have an authenticate function that takes a User object as a parameter. I don't really care during design that I might need an internal string md5 function wrapper to accomplish some specific goal. I find out about that while coding.
The goal of initial design is to get organized so you can make progress with clarity and forethought rather than tearing out and reimplementing the same function 4 times because you forgot some scenario due to not planning.
EDIT: I work in PHP a lot, and I actually use PhpDoc to do some of the design docs, by simply writing the method signature with no implementation, then putting a detailed description of what the method should do in the method header comments. This helps anyone that is using my class in the future, because the design IS the documentation. I can also change the documentation if I do need to make some alterations while coding.
I work in php4 a lot, so I don't get to use interfaces. In php5, I create the interface, then implement it elsewhere.
The best way to specify how the code should actually fit together is in code. The design document is for other things that are not easily expressed in code. You should use it for describing the actual need the program fills, How it interacts with users, what the constraints are in terms of hardware and operating systems. Certainly describe the overall architecture of your application in a design document, but, for instance, the API should actually be described in the code that exposes the API.
You have already gone far enough with the documentation part. As you still a beginner in C++, when you would understand the language, you might want to change the structure of your program. Then you would have to do changes in the documentation. I would suggest that you have already gone too far with the documentation. No need to drill more into it
Like everyone else says, you've gone way past where you need to go with the design. Do you have a good set of requirements to the simple true/false statement level that you derived that design from? You can design all day long, but if you don't have requirements that simply say WHAT you're going to do, it doesn't matter how good your design is.