Related
I am doing my own research project, and I am quite struggling regarding the right choice of architectural/design patterns.
In this project, after the "system" start, I need to do something in background (tasks, processing, display data and so on) and at the same be able to interact with the system using, for example, keyboard and send some commands, like "give me status of this particular object" or "what is the data in this object".
So my question is - what software architectural/design patterns can be applied to this particular project? How the interraction between classes/objects should be organized? How should the objects be created?
Can, for example, "event-driven architecture" or "Microkernel" be applied here? Some references to useful resources will be very much appreciated!
Thank you very much in advance!
Careful with design patterns. If you sprinkle them throughout your code hoping that everything will work great, you'll soon have an unreadable, boilerplate full mess. They are recipes, not solutions.
My advice to you is pick a piece of paper and a pencil and start drawing all the entities of your domain, with all their requisites, and see how they relate. If you want to get somewhat serious about it, you can do something like this.
When defining your entities, strive for high cohesion and loose coupling.
High cohesion means that you should keep similar functionalities together. In a very simple example, if you have a class that reads stuff from a file and processes it, the class has low cohesion, since reading and processing are two very distinct functionalities. In this case, you would want a class for each functionality.
As for loose coupling, it means that your entities should be independent of each other. Using the example above, supposed that you are now the proud owner of two highly cohesive classes - one that reads stuff from a file (Reader), and one that processes that stuff (Processor). Now, suppose that the Processor class has an instance of the Reader class, and calls it in order to get its input. In this case, we can say that both classes are tightly coupled, since Processor won't work without Reader. In the OOP world, the solution for this is typically the use of interfaces. You can find a neat example here.
After defining an initial model of your domain and gathering as much knowledge about it as you can, you can now start to think about the implementation's architecture. This is were you can start thinking about the architectural patterns. Event driven architecture, clean architecture, MVP, MVVM... It will all depend on your domain. It is your job to know which pattern will fit best. Spoiler alert: this can be extremely hard to do correctly even for experienced engineers, so don't be afraid to fail.
Finally, leave the design patterns for the implementation stage. Their use completely depends on your implementation problems and decisions. Also, DON'T FORCE THEM. Ideally, you will solve a problem and, IF APPLICABLE, you'll see a pattern emerging. Trust me, the last thing you want is to have a case of design patternitis. Anyway, if you need literature on patterns, I totally recommend this book. It's great no matter your level as an engineer.
Further reading:
SOLID principles
Onion Architecture
Clean architecture
Good luck!
You have a background task, and it can be used for a message pump/event queue indeed. Then your foreground task would send requests to this background thread and asynchronously wait for the result.
Have a look at the book "Patterns for Parallel Programming".
It is much better if you check a book for Design Patterns. I really like this one.
For example, if you need to get some data from a particular object, you may need the Observer Pattern to work for you and as soon as the object has the data, you (or another object) get to know this data and can work with it, with another pattern (strategy might work, it really depends on what you have to do).
If you have to do some things at the same time, check also the Singleton pattern (well, check the most important ones!).
I've got lots of problems with project i am currently working on. The project is more than 10 years old and it was based on one of those commercial C++ frameworks which were very populary in the 90's. The problem is with statecharts. The framework provides quite common implementation of state pattern. Each state is a separate class, with action on entry, action in state etc. There is a switch which sets current state according to received events.
Devil is hidden in details. That project is enormous. It's something about 2000 KLOC. There is definitely too much statecharts (i've seen "for" loops implemented using statecharts). What's more ... framework allows to embed statechart in another statechart so there are many statecherts with seven or even more levels of nesting. Because statecharts run in different threads, and it's possible to send events between statecharts we have lots of synchronization problems (and big mess in interfaces).
I must admit that scale of this problem is overwhelming and I don't know how to touch it. My first idea was to remove as much code as I can from statecharts and put it into separate classes. Then delegate these classes from statechart to do a job. But in result we will have many separate functions, which logically don't have any specific functionality and any change in statechart architecture will need also a change of that classes and functions.
So I asking for help:
Do you know any books/articles/magic artefacts which can help me to fix this ? I would like to at least separate as much code as I can from statechart without introducing any hidden dependencies and keep separated code maintainable, testable and reusable.
If you have any suggestion how to handle this, please let me know.
The statechart pattern is intended to be used specifically to remove switch statements, so this sounds like a horrid abuse. Additionally, states should only change on asynchronous events. If you are processing an event and you change through multiple states (or for loop, etc.), then this is also a horrid abuse of the pattern.
I would start from these two points, as they will solve much of your concurrency issues just fixing them up. What you need to determine is:
What are your external, asynchronous events to the system? These are the only things that should be determining state transitions, not things that happen during event processing. An event may cause 0 or 1 state transitions. Once you have a list of these state transitions, you can reconstruct the actual states of your system. If you are aware of UML State diagrams, this would be a perfect time to sketch one up in a charting program, not just for yourself (though it will help you immensely), but also for everyone in the future that has to return to the project. As you have learned, this happens.
Now that you know what are really states, list what are states in the code that shouldn't be. This usually indicates that something can be "functionally decomposed". Instead of a state object for each of these, likely all that is needed is a separate function. This will cut down on a lot of the overhead of state objects and should clean up the code immensely.
Now it's time to tackle those horrendous switch statements you mentioned. If they were truly based on state, you shouldn't need one at all. Instead, you should be able to call the state machine directly.
Something like:
myStateMachine->myEvent();
and it should work without any switch. But notice, this may be the case even for some of those objects that don't work across asynchronous events. This is also an indication of where you may just use inheritance to get the same effect. If you have:
switch (someTypeIdentifier)
{
case type1:
doSomething();
break;
case type2:
doSomethingElse();
break;
}
usually the correct OOP method to do is to create two actual types Type1, Type2, both derived from an abstract base TypeBase, with a virtual method doSomething() that does what you need. The reason this is useful is because it means you can "close" the handling (in the meaning of the Open/Closed Principle), and still extend the functionality by adding new derived types as needed (leaving it open to extension). This saves bugs like crazy because it gets developers hands out of those switch statements, which can get quite ugly and convoluted, instead encapsulating each separate behavior in separate classes.
4 - Now look to fix up your thread issues. Identify all objects used from multiple threads. Make a list. Now, how are these used? Are some of them always used together? Start making groups. The goal here is to find the level of encapsulation that best works for these objects, separate the objects into individual classes that control their own synchronisation, figure out the atomic level of actual "transactions" for the objects, and make methods of the classes that expose those meaningful transactions, wrapped behind the scenes with the appropriate mutexes, condition variables, etc.
You might be saying "that sounds like a lot of work! Why do all that instead of just writing it all over myself?" Good question! :) The reason is actually straightforward: if you are going to do it all by yourself, those are the steps you should be doing anyway. You should be identifying your states, your dynamic polymorphism, and getting a handle on the multithreaded transactions. But, if you start with the existing code, you also have all of those unspoken business rules that were never documented and may cause all sorts of unexpected bugs down the line. You don't have to bring everything over - if you suspect it's a bug, discuss the logic with the people who have worked with the system in the past (if available), QA, or whoever might identify bugs, and see if it really should be carried over. But you need to actually evaluate what the bugs are either way, or you may not code something that actually needed coding.
In the end, this is a manual process that is a part of software engineering. There are CASE tools that can help draw up the state diagrams and even publish them to code, there are refactoring tools, like those found in many IDEs, that can help move code between functions and classes, and similar tools which can help identify threading needs. However, those things shouldn't be picked up for a single project. They need to be learned throughout your career, picking them up and learning them more deeply over years of work, as they are a part of being a software engineer. They don't do it for you. You still need to know the whys and hows, and they just help get it done more efficiently.
Statecharts (including nested Statecharts) are a powerful way to specify, understand and even simulate/validate complex control flow. But to gain the benefit, you need the statechart model in a suitable tool (I used Statemate way back in the day, not sure if it's still available), plus a reliable mapping from the chart to the code (Statemate used to generate the code) - then you can forget about the state management code (mostly)! In your situation, if you don't have the model, I would try to reverse one from the code - as Ira says, chances are high that the original developers had a model in some form, and you may find the code making a lot of sense as the model emerges. If this works out, you will have a really good spec/model of the code which should make future code edits much easier (even if you don't want to go to automatic code generation, and maintain the code/model mapping manually (but you'll need to be meticulous!!))
Sounds to me like your best bet is (gulp!) likely to start from scratch if it's as horrifically broken as you make out. Is there any documentation? Could you begin to build some saner software based on the docs?
If a complete re-write isn't an option (and they never are in my experience) I'd try some of the following:
If you don't already have it, draw an architectural picture of the whole system. Sketch out how all the bits are supposed to work together and that will help you break the system down into potentially manageable / testable parts.
Do you have any kind of requirements or testing plan in place? If not, can you write one and start to put unit tests in place for the various chunks of code / functionality which exist already? If you can do that, you can start to refactor things without breaking as much of whatever does currently work.
Once you've broken things down a bit, start building your unit tests into integration tests which pull together more of the functionality.
I've not read them myself, but I've heard good things about these books which may have some advice you can use:
Refactoring: Improving the Design of Existing Code (Object Technology Series).
Working Effectively with Legacy Code (Robert C. Martin)
Good luck! :-)
I'm working on developing a fairly robust 2D game engine as a base that other games can be built off of as a for-fun project (I know theres already things that do this, but that's no fun).
I'm trying to figure out a good way to do message-passing between classes within the engine. At first I was thinking about using a heirarchy of exceptions and throwing them whenever something required it. But as I was developing that way, I realized that there was quite a large number of exceptions being thrown, as they were being used for fairly common things (part of subroutines that handle pathfinding and unit locating and things that need to test the state of the game board alot). The exceptions were being used for things like the pathfinding came across a unit in the way and needed to go around it, it would throw a TileOccupied exception and the pathfinding could gracefully handle that and continue. As can be expected, this created a lot of exceptions.
The internet has told me that exceptions are expensive due to all the run-time processing they need to do. But they handle what I need perfectly, which is being able to propogate a message back to the caller even through branching subroutines to indicate when something has happened or something was not as expected.
Is there any clean/efficient way to do this in c++? Or am I structuring things very wrongly if I am using this type of notification? I'm still learning, so any suggestions would be greatly appreciated (and I'm willing to read / learn any references you can throw my way)
Edit
I'm trying to do this in standard c++ btw. I am writing it on linux, and want it to compile and be runnable platform-independent. I'm currently using Boost in it.
Although this requires explicit registration, this sounds like you want callbacks (eased by e.g. Boost.Function) or signals (like Boost.Signals/Signals2).
Is there any reason you haven't used an event queue and/or observer setup?
Exceptions are the wrong way of doing this. Usual suggestion would be direct events/listeners design, but that quickly gets out of hand within any non-trivial system. I'd point you to a whiteboard design for loose communications.
When I start writing code from scratch, I have a bad habit of quickly writing everything in one function, the whole time thinking "I'll make it more modular later". Then when later comes along, I have a working product and any attempts to fix it would mean creating functions and having to figure out what I need to pass.
It gets worst because it becomes extremely difficult to redesign classes when your project is almost done. For example, I usually do some planning before I start writing code, then when my project is done, I realized I could have made the classes more modular and/or I could have used inheritance. Basically, I don't think I do enough planning and I don't get more than one-level of abstraction.
So in the end, I'm stuck with a program with a large main function, one class and a few helper functions. Needless to say, it is not very reusable.
Has anybody had the same problem and have any tips to overcome this? One thing I had in mind was to write the main function with pseduocode (without much detail but enough to see what objects and functions they need). Essentially a top-down approach.
Is this a good idea? Any other suggestions?
"First we make our habits, then they make us."
This seems to apply for both good and bad habits. Sounds like a bad one has taken hold of you.
Practice being more modular up front until it's "just the way I do things."
Yes, the solution is easy, although it takes time to get used to it.
Never claim there will be a "later", where you sit down and just do refactoring. Instead, continue adding functionality to your code (or tests) and during this phase perform small, incremental refactorings. The "later" will basically be "always", but hidden in the phase where you are actually doing something new every time.
I find the TDD Red-Green-Refactor discipline works wonders.
My rule of thumb is that anything longer than 20 LoC should be clean. IME every project stands on a few "just-a-proof-of-concept"s that were never intended to end up in production code. Since this seems inevitable though, even 20 lines of proof-of-concept code should be clear, because they might end up being one of the foundations of a big project.
My approach is top-down. I write
while( obj = get_next_obj(data) ) {
wibble(obj);
fumble(obj);
process( filter(obj) );
}
and only start to write all these functions later. (Usually they are inline and go into the unnamed namespace. Sometimes they turn out to be one-liners and then I might eliminate them later.)
This way I also avoid to have to comment the algorithms: The function names are explanation enough.
You pretty much identified the issue. Not having enough planning.
Spend some time analyzing the solution you're going to develop, break it down into pieces of functionality, identify how it would be best to implement them and try to separate the layers of the application (UI, business logic, data access layer, etc).
Think in terms of OOP and refactor as early as it makes sense. It's a lot cheaper than doing it after everything is built.
Write the main function minimally, with almost nothing in it. In most gui programs, sdl games programs, open gl, or anything with any kind of user interface at all, the main function should be nothing more than an event eating loop. It has to be, or there will always be long stretches of time where the computer seems unresponsive, and the operating system thinks considers maybe shutting it down because it's not responding to messages.
Once you get your main loop, quickly lock that down, only to be modified for bug fixes, not new functionality. This may just end up displacing the problem to another function, but having a monilithic function is rather difficult to do in an event based application anyway. You'll always need a million little event handlers.
Maybe you have a monolithic class. I've done that. Mainly the way to deal with it is to try and keep a mental or physical map of dependencies, and note where there's ... let's say, perforations, fissures where a group of functions doesn't explicitly depend on any shared state or variables with other functions in the class. There you can spin that cluster of functions off into a new class. If it's really a huge class, and really tangled up, I'd call that a code smell. Think about redesigning such a thing to be less huge and interdependant.
Another thing you can do is as you're coding, note that when a function grows to a size where it no longer fits on a single screen, it's probably too big, and at that point start thinking about how to break it down into multiple smaller functions.
Refactoring is a lot less scary if you have good tools to do it. I see you tagged your question as "C++" but the same goes for any language. Get an IDE where extracting and renaming methods, extracting variables, etc. is easy to do, and then learn how to use that IDE effectively. Then the "small, incremental refactorings" that Stefano Borini mentions will be less daunting.
Your approach isn't necessarily bad -- earlier more modular design might end up as over-engineering.
You do need to refactor -- this is a fact of life. The question is when? Too late, and the refactoring is too big a task and too risk-prone. Too early, and it might be over-engineering. And, as time goes on, you will need to refactor again .. and again. This is just part of the natural life-cycle of software.
The trick is to refactor soon, but not too soon. And frequently, but not too frequently. How soon and how frequently? That's why it's a art and not a science :)
How would you go about converting a reasonably large (>300K), fairly mature C codebase to C++?
The kind of C I have in mind is split into files roughly corresponding to modules (i.e. less granular than a typical OO class-based decomposition), using internal linkage in lieu private functions and data, and external linkage for public functions and data. Global variables are used extensively for communication between the modules. There is a very extensive integration test suite available, but no unit (i.e. module) level tests.
I have in mind a general strategy:
Compile everything in C++'s C subset and get that working.
Convert modules into huge classes, so that all the cross-references are scoped by a class name, but leaving all functions and data as static members, and get that working.
Convert huge classes into instances with appropriate constructors and initialized cross-references; replace static member accesses with indirect accesses as appropriate; and get that working.
Now, approach the project as an ill-factored OO application, and write unit tests where dependencies are tractable, and decompose into separate classes where they are not; the goal here would be to move from one working program to another at each transformation.
Obviously, this would be quite a bit of work. Are there any case studies / war stories out there on this kind of translation? Alternative strategies? Other useful advice?
Note 1: the program is a compiler, and probably millions of other programs rely on its behaviour not changing, so wholesale rewriting is pretty much not an option.
Note 2: the source is nearly 20 years old, and has perhaps 30% code churn (lines modified + added / previous total lines) per year. It is heavily maintained and extended, in other words. Thus, one of the goals would be to increase mantainability.
[For the sake of the question, assume that translation into C++ is mandatory, and that leaving it in C is not an option. The point of adding this condition is to weed out the "leave it in C" answers.]
Having just started on pretty much the same thing a few months ago (on a ten-year-old commercial project, originally written with the "C++ is nothing but C with smart structs" philosophy), I would suggest using the same strategy you'd use to eat an elephant: take it one bite at a time. :-)
As much as possible, split it up into stages that can be done with minimal effects on other parts. Building a facade system, as Federico Ramponi suggested, is a good start -- once everything has a C++ facade and is communicating through it, you can change the internals of the modules with fair certainty that they can't affect anything outside them.
We already had a partial C++ interface system in place (due to previous smaller refactoring efforts), so this approach wasn't difficult in our case. Once we had everything communicating as C++ objects (which took a few weeks, working on a completely separate source-code branch and integrating all changes to the main branch as they were approved), it was very seldom that we couldn't compile a totally working version before we left for the day.
The change-over isn't complete yet -- we've paused twice for interim releases (we aim for a point-release every few weeks), but it's well on the way, and no customer has complained about any problems. Our QA people have only found one problem that I recall, too. :-)
What about:
Compiling everything in C++'s C subset and get that working, and
Implementing a set of facades leaving the C code unaltered?
Why is "translation into C++ mandatory"? You can wrap the C code without the pain of converting it into huge classes and so on.
Your application has lots of folks working on it, and a need to not-be-broken.
If you are serious about large scale conversion to an OO style, what
you need is massive transformation tools to automate the work.
The basic idea is to designate groups of data as classes, and then
get the tool to refactor the code to move that data into classes,
move functions on just that data into those classes,
and revise all accesses to that data to calls on the classes.
You can do an automated preanalysis to form statistic clusters to get some ideas,
but you'll still need an applicaiton aware engineer to decide what
data elements should be grouped.
A tool that is capable of doing this task is our DMS Software Reengineering
Toolkit.
DMS has strong C parsers for reading your code, captures the C code
as compiler abstract syntax trees, (and unlike a conventional compiler)
can compute flow analyses across your entire 300K SLOC.
DMS has a C++ front end that can be used as the "back" end;
one writes transformations that map C syntax to C++ syntax.
A major C++ reengineering task on a large avionics system gives
some idea of what using DMS for this kind of activity is like.
See technical papers at
www.semdesigns.com/Products/DMS/DMSToolkit.html,
specifically
Re-engineering C++ Component Models Via Automatic Program Transformation
This process is not for the faint of heart. But than anybody
that would consider manual refactoring of a large application
is already not afraid of hard work.
Yes, I'm associated with the company, being its chief architect.
I would write C++ classes over the C interface. Not touching the C code will decrease the chance of messing up and quicken the process significantly.
Once you have your C++ interface up; then it is a trivial task of copy+pasting the code into your classes. As you mentioned - during this step it is vital to do unit testing.
GCC is currently in midtransition to C++ from C. They started by moving everything into the common subset of C and C++, obviously. As they did so, they added warnings to GCC for everything they found, found under -Wc++-compat. That should get you on the first part of your journey.
For the latter parts, once you actually have everything compiling with a C++ compiler, I would focus on replacing things that have idiomatic C++ counterparts. For example, if you're using lists, maps, sets, bitvectors, hashtables, etc, which are defined using C macros, you will likely gain a lot by moving these to C++. Likewise with OO, you'll likely find benefits where you are already using a C OO idiom (like struct inheritence), and where C++ will afford greater clarity and better type checking on your code.
Your list looks okay except I would suggest reviewing the test suite first and trying to get that as tight as possible before doing any coding.
Let's throw another stupid idea:
Compile everything in C++'s C subset and get that working.
Start with a module, convert it in a huge class, then in an instance, and build a C interface (identical to the one you started from) out of that instance. Let the remaining C code work with that C interface.
Refactor as needed, growing the OO subsystem out of C code one module at a time, and drop parts of the C interface when they become useless.
Probably two things to consider besides how you want to start are on what you want to focus, and where you want to stop.
You state that there is a large code churn, this may be a key to focus your efforts. I suggest you pick the parts of your code where a lot of maintenance is needed, the mature/stable parts are apparently working well enough, so it is better to leave them as they are, except probably for some window dressing with facades etc.
Where you want to stop depends on what the reason is for wanting to convert to C++. This can hardly be a goal in itself. If it is due to some 3rd party dependency, focus your efforts on the interface to that component.
The software I work on is a huge, old code base which has been 'converted' from C to C++ years ago now. I think it was because the GUI was converted to Qt. Even now it still mostly looks like a C program with classes. Breaking the dependencies caused by public data members, and refactoring the huge classes with procedural monster methods into smaller methods and classes never has really taken off, I think for the following reasons:
There is no need to change code that is working and that does not need to be enhanced. Doing so introduces new bugs without adding functionality, and end users don't appreciate that;
It is very, very hard to do refactor reliably. Many pieces of code are so large and also so vital that people hardly dare touching it. We have a fairly extensive suite of functional tests, but sufficient code coverage information is hard to get. As a result, it is difficult to establish whether there are already sufficient tests in place to detect problems during refactoring;
The ROI is difficult to establish. The end user will not benefit from refactoring, so it must be in reduced maintenance cost, which will increase initially because by refactoring you introduce new bugs in mature, i.e. fairly bug-free code. And the refactoring itself will be costly as well ...
NB. I suppose you know the "Working effectively with Legacy code" book?
You mention that your tool is a compiler, and that: "Actually, pattern matching, not just type matching, in the multiple dispatch would be even better".
You might want to take a look at maketea. It provides pattern matching for ASTs, as well as the AST definition from an abstract grammar, and visitors, tranformers, etc.
If you have a small or academic project (say, less than 10,000 lines), a rewrite is probably your best option. You can factor it however you want, and it won't take too much time.
If you have a real-world application, I'd suggest getting it to compile as C++ (which usually means primarily fixing up function prototypes and the like), then work on refactoring and OO wrapping. Of course, I don't subscribe to the philosophy that code needs to be OO structured in order to be acceptable C++ code. I'd do a piece-by-piece conversion, rewriting and refactoring as you need to (for functionality or for incorporating unit testing).
Here's what I would do:
Since the code is 20 years old, scrap down the parser/syntax analyzer and replace it with one of the newer lex/yacc/bison(or anything similar) etc based C++ code, much more maintainable and easier to understand. Faster to develop too if you have a BNF handy.
Once this is retrofitted to the old code, start wrapping modules into classes. Replace global/shared variables with interfaces.
Now what you have will be a compiler in C++ (not quite though).
Draw a class diagram of all the classes in your system, and see how they are communicating.
Draw another one using the same classes and see how they ought to communicate.
Refactor the code to transform the first diagram to the second. (this might be messy and tricky)
Remember to use C++ code for all new code added.
If you have some time left, try replacing data structures one by one to use the more standardized STL or Boost.