Related
I am doing my own research project, and I am quite struggling regarding the right choice of architectural/design patterns.
In this project, after the "system" start, I need to do something in background (tasks, processing, display data and so on) and at the same be able to interact with the system using, for example, keyboard and send some commands, like "give me status of this particular object" or "what is the data in this object".
So my question is - what software architectural/design patterns can be applied to this particular project? How the interraction between classes/objects should be organized? How should the objects be created?
Can, for example, "event-driven architecture" or "Microkernel" be applied here? Some references to useful resources will be very much appreciated!
Thank you very much in advance!
Careful with design patterns. If you sprinkle them throughout your code hoping that everything will work great, you'll soon have an unreadable, boilerplate full mess. They are recipes, not solutions.
My advice to you is pick a piece of paper and a pencil and start drawing all the entities of your domain, with all their requisites, and see how they relate. If you want to get somewhat serious about it, you can do something like this.
When defining your entities, strive for high cohesion and loose coupling.
High cohesion means that you should keep similar functionalities together. In a very simple example, if you have a class that reads stuff from a file and processes it, the class has low cohesion, since reading and processing are two very distinct functionalities. In this case, you would want a class for each functionality.
As for loose coupling, it means that your entities should be independent of each other. Using the example above, supposed that you are now the proud owner of two highly cohesive classes - one that reads stuff from a file (Reader), and one that processes that stuff (Processor). Now, suppose that the Processor class has an instance of the Reader class, and calls it in order to get its input. In this case, we can say that both classes are tightly coupled, since Processor won't work without Reader. In the OOP world, the solution for this is typically the use of interfaces. You can find a neat example here.
After defining an initial model of your domain and gathering as much knowledge about it as you can, you can now start to think about the implementation's architecture. This is were you can start thinking about the architectural patterns. Event driven architecture, clean architecture, MVP, MVVM... It will all depend on your domain. It is your job to know which pattern will fit best. Spoiler alert: this can be extremely hard to do correctly even for experienced engineers, so don't be afraid to fail.
Finally, leave the design patterns for the implementation stage. Their use completely depends on your implementation problems and decisions. Also, DON'T FORCE THEM. Ideally, you will solve a problem and, IF APPLICABLE, you'll see a pattern emerging. Trust me, the last thing you want is to have a case of design patternitis. Anyway, if you need literature on patterns, I totally recommend this book. It's great no matter your level as an engineer.
Further reading:
SOLID principles
Onion Architecture
Clean architecture
Good luck!
You have a background task, and it can be used for a message pump/event queue indeed. Then your foreground task would send requests to this background thread and asynchronously wait for the result.
Have a look at the book "Patterns for Parallel Programming".
It is much better if you check a book for Design Patterns. I really like this one.
For example, if you need to get some data from a particular object, you may need the Observer Pattern to work for you and as soon as the object has the data, you (or another object) get to know this data and can work with it, with another pattern (strategy might work, it really depends on what you have to do).
If you have to do some things at the same time, check also the Singleton pattern (well, check the most important ones!).
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
I am the developer of some family tree software (written in C++ and Qt). I had no problems until one of my customers mailed me a bug report. The problem is that the customer has two children with their own daughter, and, as a result, he can't use my software because of errors.
Those errors are the result of my various assertions and invariants about the family graph being processed (for example, after walking a cycle, the program states that X can't be both father and grandfather of Y).
How can I resolve those errors without removing all data assertions?
It seems you (and/or your company) have a fundamental misunderstanding of what a family tree is supposed to be.
Let me clarify, I also work for a company that has (as one of its products) a family tree in its portfolio, and we have been struggling with similar problems.
The problem, in our case, and I assume your case as well, comes from the GEDCOM format that is extremely opinionated about what a family should be. However this format contains some severe misconceptions about what a family tree really looks like.
GEDCOM has many issues, such as incompatibility with same sex relations, incest, etc... Which in real life happens more often than you'd imagine (especially when going back in time to the 1700-1800).
We have modeled our family tree to what happens in the real world: Events (for example, births, weddings, engagement, unions, deaths, adoptions, etc.). We do not put any restrictions on these, except for logically impossible ones (for example, one can't be one's own parent, relations need two individuals, etc...)
The lack of validations gives us a more "real world", simpler and more flexible solution.
As for this specific case, I would suggest removing the assertions as they do not hold universally.
For displaying issues (that will arise) I would suggest drawing the same node as many times as needed, hinting at the duplication by lighting up all the copies on selecting one of them.
Relax your assertions.
Not by changing the rules, which are mostly likely very helpful to 99.9% of your customers in catching mistakes in entering their data.
Instead, change it from an error "can't add relationship" to a warning with an "add anyway".
Here's the problem with family trees: they are not trees. They are directed acyclic graphs or DAGs. If I understand the principles of the biology of human reproduction correctly, there will not be any cycles.
As far as I know, even the Christians accept marriages (and thus children) between cousins, which will turn the family tree into a family DAG.
The moral of the story is: choose the right data structures.
I guess that you have some value that uniquely identifies a person on which you can base your checks.
This is a tricky one. Assuming you want to keep the structure a tree, I suggest this:
Assume this: A has kids with his own daughter.
A adds himself to the program as A and as B. Once in the role of father, let's call it boyfriend.
Add a is_same_for_out() function which tells the output generating part of your program that all links going to B internally should be going to A on presentation of data.
This will make some extra work for the user, but I guess IT would be relatively easy to implement and maintain.
Building from that, you could work on code synching A and B to avoid inconsistencies.
This solution is surely not perfect, but is a first approach.
You should focus on what really makes value for your software. Is the time spent on making it work for ONE consumer worth the price of the license ? Likely not.
I advise you to apologize to this customer, tell him that his situation is out of scope for your software and issue him a refund.
You should have set up the Atreides family (either modern, Dune, or ancient, Oedipus Rex) as a testing case. You don't find bugs by using sanitized data as a test case.
This is one of the reasons why languages like "Go" do not have assertions. They are used to handle cases that you probably didn't think about, all too often. You should only assert the impossible, not simply the unlikely. Doing the latter is what gives assertions a bad reputation. Every time you type assert(, walk away for ten minutes and really think about it.
In your particularly disturbing case, it is both conceivable and appalling that such an assertion would be bogus under rare but possible circumstances. Hence, handle it in your app, if only to say "This software was not designed to handle the scenario that you presented".
Asserting that your great, great, great grandfather being your father as impossible is a reasonable thing to do.
If I was working for a testing company that was hired to test your software, of course I would have presented that scenario. Why? Every juvenile yet intelligent 'user' is going to do the exact same thing and relish in the resulting 'bug report'.
I hate commenting on such a screwed up situation, but the easiest way to not rejigger all of your invariants is to create a phantom vertex in your graph that acts as a proxy back to the incestuous dad.
So, I've done some work on family tree software. I think the problem you're trying to solve is that you need to be able to walk the tree without getting in infinite loops - in other words, the tree needs to be acyclical.
However, it looks like you're asserting that there is only one path between a person and one of their ancestors. That will guarantee that there are no cycles, but is too strict. Biologically speaking, descendancy is a directed acyclic graph (DAG). The case you have is certainly a degenerate case, but that type of thing happens all the time on larger trees.
For example, if you look at the 2^n ancestors you have at generation n, if there was no overlap, then you'd have more ancestors in 1000 AD than there were people alive. So, there's got to be overlap.
However, you also do tend to get cycles that are invalid, just bad data. If you're traversing the tree, then cycles must be dealt with. You can do this in each individual algorithm, or on load. I did it on load.
Finding true cycles in a tree can be done in a few ways. The wrong way is to mark every ancestor from a given individual, and when traversing, if the person you're going to step to next is already marked, then cut the link. This will sever potentially accurate relationships. The correct way to do it is to start from each individual, and mark each ancestor with the path to that individual. If the new path contains the current path as a subpath, then it's a cycle, and should be broken. You can store paths as vector<bool> (MFMF, MFFFMF, etc.) which makes the comparison and storage very fast.
There are a few other ways to detect cycles, such as sending out two iterators and seeing if they ever collide with the subset test, but I ended up using the local storage method.
Also note that you don't need to actually sever the link, you can just change it from a normal link to a 'weak' link, which isn't followed by some of your algorithms. You will also want to take care when choosing which link to mark as weak; sometimes you can figure out where the cycle should be broken by looking at birthdate information, but often you can't figure out anything because so much data is missing.
Another mock serious answer for a silly question:
The real answer is, use an appropriate data structure. Human genealogy cannot fully be expressed using a pure tree with no cycles. You should use some sort of graph. Also, talk to an anthropologist before going any further with this, because there are plenty of other places similar errors could be made trying to model genealogy, even in the most simple case of "Western patriarchal monogamous marriage."
Even if we want to ignore locally taboo relationships as discussed here, there are plenty of perfectly legal and completely unexpected ways to introduce cycles into a family tree.
For example: http://en.wikipedia.org/wiki/Cousin_marriage
Basically, cousin marriage is not only common and expected, it is the reason humans have gone from thousands of small family groups to a worldwide population of 6 billion. It can't work any other way.
There really are very few universals when it comes to genealogy, family and lineage. Almost any strict assumption about norms suggesting who an aunt can be, or who can marry who, or how children are legitimized for the purpose of inheritance, can be upset by some exception somewhere in the world or history.
Potential legal implications aside, it certainly seems that you need to treat a 'node' on a family tree as a predecessor-person rather than assuming that the node can be the-one-and-only person.
Have the tree node include a person as well as the successors - and then you can have another node deeper down the tree that includes the same person with different successors.
A few answers have shown ways to keep the assertions/invariants, but this seems like a misuse of assertions/invariant. Assertions are to make sure something that should be true is true, and invariants are to make sure something that shouldn't change doesn't change.
What you're asserting here is that incestuous relationships don't exist. Clearly they do exist, so your assertion is invalid. You can work around this assertion, but the real bug is in the assertion itself. The assertion should be removed.
Your family tree should use directed relations. This way you won't have a cycle.
Genealogical data is cyclic and does not fit into an acyclic graph, so if you have assertions against cycles you should remove them.
The way to handle this in a view without creating a custom view is to treat the cyclic parent as a "ghost" parent. In other words, when a person is both a father and a grandfather to the same person, then the grandfather node is shown normally, but the father node is rendered as a "ghost" node that has a simple label like ("see grandfather") and points to the grandfather.
In order to do calculations you may need to improve your logic to handle cyclic graphs so that a node is not visited more than once if there is a cycle.
The most important thing is to avoid creating a problem, so I believe that you should use a direct relation to avoid having a cycle.
As #markmywords said, #include "fritzl.h".
Finally I have to say recheck your data structure. Maybe something is going wrong over there (maybe a bidirectional linked list solves your problem).
Assertions don't survive reality
Usually assertions don't survive the contact with real world data. It's a part of the process of software engineering to decide, with which data you want to deal and which are out of scope.
Cyclic family graphs
Regarding family "trees" (in fact it are full blown graphs, including cycles), there is a nice anecdote:
I married a widow who had a grown daughter. My father, who often visited us, fell in love with my step-daughter and married her. As a result, my father became my son, and my daughter became my mother. Some time later, I gave my wife a son, who was the brother of my father, and my uncle. My father's wife (who is also my daughter and my mother) got a son. As a result, I got a brother and a grandson in the same person. My wife is now my grandmother, because she is my mother's mother. So I am the husband of my wife, and at the same time the step-grandson of my wife. In other words, I'm my own grandpa.
Things get even more strange, when you take surrogates or "fuzzy fatherhood" into account.
How to deal with that
Define cycles as out-of-scope
You could decide that your software should not deal with such rare cases. If such a case occurs, the user should use a different product. This makes dealing with the more common cases much more robust, because you can keep more assertions and a simpler data model.
In this case, add some good import and export features to your software, so the user can easily migrate to a different product when necessary.
Allow manual relations
You could allow the user to add manual relations. These relations are not "first-class citizens", i.e. the software takes them as-is, doesn't check them and doesn't handle them in the main data model.
The user can then handle rare cases by hand. Your data model will still stay quite simple and your assertions will survive.
Be careful with manual relations. There is a temptation to make them completely configurable and hence create a fully configurable data model. This will not work: Your software will not scale, you will get strange bugs and finally the user interface will become unusable. This anti-pattern is called "soft coding", and "The daily WTF" is full of examples for that.
Make your data model more flexible, skip assertions, test invariants
The last resort would be making your data model more flexible. You would have to skip nearly all assertions and base your data model on a full blown graph. As the above example shows, it is easily possible to be your own grandfather, so you can even have cycles.
In this case, you should extensively test your software. You had to skip nearly all assertions, so there is a good chance for additional bugs.
Use a test data generator to check unusual test cases. There are quick check libraries for Haskell, Erlang or C. For Java / Scala there are ScalaCheck and Nyaya. One test idea would be to simulate a random population, let it interbreed at random, then let your software first import and then export the result. The expectation would be, that all connections in the output are also in the input and vice verse.
A case, where a property stays the same is called an invariant. In this case, the invariant is the set of "romantic relations" between the individuals in the simulated population. Try to find as much invariants as possible and test them with randomly generated data. Invariants can be functional, e.g.:
an uncle stays an uncle, even when you add more "romantic relations"
every child has a parent
a population with two generations has at least one grand-parent
Or they can be technical:
Your software will not crash on a graph up to 10 billion members (no matter how many interconnections)
Your software scales with O(number-of-nodes) and O(number-of-edges^2)
Your software can save and re-load every family graph up to 10 billion members
By running the simulated tests, you will find lots of strange corner cases. Fixing them will take a lot of time. Also you will lose a lot of optimizations, your software will run much slower. You have to decide, if it is worth it and if this is in the scope of your software.
Instead of removing all assertions, you should still check for things like a person being his/her own parent or other impossible situations and present an error. Maybe issue a warning if it is unlikely so the user can still detect common input errors, but it will work if everything is correct.
I would store the data in a vector with a permanent integer for each person and store the parents and children in person objects where the said int is the index of the vector. This would be pretty fast to go between generations (but slow for things like name searches). The objects would be in order of when they were created.
Duplicate the father (or use symlink/reference).
For example, if you are using hierarchical database:
$ #each person node has two nodes representing its parents.
$ mkdir Family
$ mkdir Family/Son
$ mkdir Family/Son/Daughter
$ mkdir Family/Son/Father
$ mkdir Family/Son/Daughter/Father
$ ln -s Family/Son/Daughter/Father Family/Son/Father
$ mkdir Family/Son/Daughter/Wife
$ tree Family
Family
└── Son
├── Daughter
│ ├── Father
│ └── Wife
└── Father -> Family/Son/Daughter/Father
4 directories, 1 file
I was just playing an old SNES RPG (Secret of Mana, if anyone cares) and was wondering a few general things about game programming.
Sorry for some of the brain-dead questions, I'm really a beginner. :)
These questions are quite general, but use SNES-style RPGs as a "template" to get an idea of what I mean:
How do games keep track of all the objects, triggered events, etc in its "world"? For example, how does it keep track of which treasure chests have already been opened, which doors are locked, which story events have already triggered?
Does it basically create an array of elements each corresponding to a chest/door/event/etc and "mark" each (change its value from 0 to 1) when it has been opened/triggered? If there are multiple approaches, what are they?
How are "variable lists" handled? Ie, if you have a game when you can have a huge inventory of objects (ie: armor, swords) and have X of each object, how is this done?
My guess: have a struct that has a big array with a spot for every possible object (an array of X ints, where X is number of possible objects to own) where each element's value represent how many of that object you have, and then have a giant enum of every object so that an object is matched to a corresponding index, and access it, like: numberOfSwords = inventory[SWORDS] where SWORDS is part of an enum and has some integer number associated with it. How close am I?
How about the case where the number of objects can vary? Ie, if I have a game where I have some amount of enemies on the screen and they can get killed / give birth to new enemies at any point, it would seem to me like I would need an array of "enemy" objects to loop through and process, but that the number of elements would vary at any one time. What is the usual approach to this? Linked lists?
Any help / hint / pointers are really appreciated.
In a very basic manner your answers are not too far off, things could be done the way that you mention them. However space and processing power can come into play so instead of an array of bools to track which treasure chests or how far along the chain of events you are you may want to slim it down to bits being on and off and use the bitwise operators for masking to see where you are in a storyline or whether or not to show the treasure chest you are about to display as opened or close.
For inventories, instead of tracking how many of each item a player has it may be better to have a base item for everything a player can pick up; weapons, armor and even money. Then you could make a linked list of just the items the player has. Use the Enum for the item as you mentioned and then the quantity of that item. This would allow for sorting of things and would also only keep in memory the items the player's character(s) actually have/has. You could extend this data structure to also track if the item is equipped. You could likely keep more generic what the item does sort of information in an items table.
The enemies would likely be a bit more complicated as you need to do a few more things with them. A linked list here though is still likely your best bet. That way removal of an object form the list would be a mite bit easier (can simply remove the link and the like when a player kills them or add in a new enemy wherever needed in the list.)
Honestly there is no one answer and it can depend on quite a few things. The best way is really to simply try it out.. For a simple 'what if I do this for this' it really does not end up taking all that long to give it a whirl and see how far you get. If you start running into issues you can start to consider other options :)
Hope this helps.
Edit: Just wanted to add in a link to www.codesampler.com. Generally more DirectX oriented tutorial sites but as a beginner it can start you thinking or give you a set of places to start. As an added bonus alot of the DirectX SDK examples/samples started to be formatted very much like how this site's tutorials are done. Can help ease you into the whole thing.
This is a pretty advanced question for a beginner.
I'd like to echo In Silico's response that you should learn C++ language basics before you tackle this subject.
To give you a place to start, you should know about container classes (Linked Lists, Vectors, HashTables/Dictionaries, Queues etc) and how they work. Since the Standard Template Library (STL) is pretty standardized, it would be a good place for a beginner to start.
You should also know about inheritance and how to build a hierarchy of classes.
For example, you asked about inventory in a role playing game:
I'd start by defining an InventoryItem class that defines or sets up an interface for all of the code necessary for an item to participate in your inventory system.
Something like:
class InventoryItem
{
private:
std::string description; // A description of the item
bool inInventory; // True if in the players inventory, false if on the ground etc...
int weight; // How much the item weighs
int size; // How much space the item takes in inventory
// etc...
};
In the InventoryItem class you'd also define the member functions and data needed for InventoryItem to be placed in your container class of choice.
The same sort of thing holds true for triggered items, things on the ground etc. They're typically kept in a container class of some sort.
The STL containers will take care of the variable sizes of the containers mentioned in the last part of your question(s).
vector is a good place to start for a general list of items.
HashTables/Dictionaries are good for looking things up with a key.
I hope this is enough to get you started. Good luck.
In addition to James' excellent post, some keywords for you to google for
Data structures
linked list
doubly linked list
queue
for the theory of dynamic memory management.
Also, let me share my standard recommended links for people asking for aid on basic c++:
Full scale tutorial on c++
C++ Language Reference (including STL)
ANSI C Language reference for all those pesky C stuff that C++ keeps using
Your question is not specific for "games programming". You are asking how arbitrary data is organized and stored in bigger programs. There is no definite answer to this, there are lots of different approaches. A common general approach is to make a data model of all the things you want to store. In C++ (or any other languages with object oriented capabilities), one can create an object oriented class model for this purpose. This introduction to C++ contains a complete tutorial on object oriented modeling in C++.
Lots of applications in general use a layered approach - the data model is one layer in your application, separated from other layers like a presentation layer or application ("game") logic.
Your data modeling approach will have to deal with persistency (that means, you want to store all your data on disk and reload it later). This question was asked earlier here on SO. This fact will give you some restrictions, for example, on the use of pointers.
EDIT: if your data model reaches a certain complexity, you might consider using a (lightweight) database, like SQLlite, which has a C/C++ api.
Finally, here is a link that might give you a good start, seems to fit exactly on your question:
http://www.dreamincode.net/forums/topic/26590-data-modeling-for-games-in-c-part-i/
Regarding Question #1, I concur with James and others on using a database that stores the persistent state of your game objects.
Regarding questions #2 and #3, about variable numbers of objects and objects that need frequent updating: I'd suggest maintaining a registry of objects that need updating for each game "cycle" (most games operate on cycles -- loops, if you like, though a modern game uses many loops spawned as separate threads).
For instance, every time you introduce a new enemy or other object that needs to be updated to respond to the current situation or behave in a certain way, you register that object in a list. Each cycle, you iterate through your current list of updateables (probably based on some priority scheduling mechanism), and update them accordingly.
But the particular data structure you use will depend on your program. Linked lists are a valid foundation structure, but in all likelihood you'll want to use a custom compound structure that meets your particular needs. Your approach may combine any number of classic data structures to achieve the best result in performance and effect.
Considering this, I can't emphasize enough the importance of studying advanced data structures before you tackle any sort of serious programming project. There are scores of great books on the topic and you'd do well to study them. Here's a link to a tolerable overview of the classic data structures: http://randu.org/tutorials/c/ads.php
So, I've come back to ask, once more, a patterns-related question. This may be too generic to answer, but my problem is this (I am programming and applying concepts that I learn as I go along):
I have several structures within structures (note, I'm using the word structure in the general sense, not in the strict C struct sense (whoa, what a tongue twister)), and quite a bit of complicated inter-communications going on. Using the example of one of my earlier questions, I have Unit objects, UnitStatistics objects, General objects, Army objects, Soldier objects, Battle objects, and the list goes on, some organized in a tree structure.
After researching a little bit and asking around, I decided to use the mediator pattern because the interdependencies were becoming a trifle too much, and the classes were starting to appear too tightly coupled (yes, another term which I just learned and am too happy about not to use it somewhere). The pattern makes perfect sense and it should straighten some of the chaotic spaghetti that I currently have boiling in my project pot.
But well, I guess I haven't learned yet enough about OO design. My question is this (finally. PS, I hope it makes sense): should I have one central mediator that deals with all communications within the program, and is it even possible? Or should I have, say, an abstract mediator and one subclassed mediator per structure type that deals with communication of a particular set of classes, e.g. a concrete mediator per army which helps out the army, its general, its units, etc.
I'm leaning more towards the second option, but I really am no expert when it comes to OO design. So third question is, what should I read to learn more about this kind of subject (I've looked at Head First's Design Patterns and the GoF book, but they're more of a "learn the vocabulary" kind of book than a "learn how to use your vocabulary" kind of book, which is what I need in this case.
As always, thanks for any and all help (including the witty comments).
I don't think you've provided enough info above to be able to make an informed decision as to which is best.
From looking at your other questions it seems that most of the communication occurs between components within an Army. You don't mention much occurring between one Army and another. In which case it would seem to make sense to have each Mediator instance coordinate communication between the components comprising a single Army - i.e. the Generals, Soldiers etc. So if you have 10 Army's then you will have 10 ArmyMediator's.
If you really want to learn O-O Design you're going to have to try things out and run the risk of getting it wrong from time to time. I think you'll learn just as much, if not more, from having to refactor a design that doesn't quite model the problem correctly into one that does, as you will from getting the design right the first time around.
Often you just won't have enough information up front to be able to choose the right design from the go anyway. Just choose the simplest one that works for now, and improve it later when you have a better idea of the requirements and/or the shortcomings of the current design.
Regarding books, personally I think the GoF book is more useful if you focus less on the specific set of patterns they describe, and focus more on the overall approach of breaking classes down into smaller reusable components, each of which typically encapsulates a single unit of functionality.
I can't answer your question directly, because I have never used that design pattern. However, whenever I have this problem, of message passing between various objects, I use the signal-slot pattern. Usually I use Qt's, but my second option is Boost's. They both solve the problem by having a single, global message passing handler. They are also both type-safe are quite efficient, both in terms of cpu-cycles and in productivity. Because they are so flexible, i.e. any object and emit any kind of signal, and any other object can receive any signal, you'll end up solving, I think, what you describe.
Sorry if I just made things worse by not choosing any of the 2 option, but instead adding a 3rd!
In order to use Mediator you need to determine:
(1) What does the group of objects, which need mediation, consist of?
(2) Among these, which are the ones that have a common interface?
The Mediator design pattern relies on the group of objects that are to be mediated to have a "common interface"; i.e., same base class: the widgets in the GoF book example inherit from same Widget base, etc.
So, for your application:
(1) Which are the structures (Soldier, General, Army, Unit, etc.) that need mediation between each other?
(2) Which ones of those (Soldier, General, Army, Unit, etc.) have a common base?
This should help you determine, as a first step, an outline of the participants in the Mediator design pattern. You may find out that some structures in (1) fall outside of (2). Then, yo may need to force them adhering to a common interface, too, if you can change that or if you can afford to make that change... (may turn out to be too much redesigning work and it violates the Open-Closed principle: your design should be, as much as possible, open to adding new features but closed to modifying existent ones).
If you discover that (1) and (2) above result in a partition of separate groups, each with its own mediator, then the number of these partitions dictate the number of different types of mediators. Now, should these different mediators have a common interface of their own? Maybe, maybe not. Polymorphism is a way of handling complexity by grouping different entities under a common interface such that they can be handled as a group rather then individually. So, would there be any benefit to group all these supposedly different types of mediators under a common interface (like the DialogDirector in the GoF book example)? Possibly, if:
(a) You may have to use a heterogeneous collection of mediators;
or
(b) You envision in the future that these mediators will evolve (and they probably will). Hence providing an abstract interface allows you to derive more evolved versions of mediators without affecting existent ones or their colleagues (the clients of the mediators).
So, without knowing more, I'd have to guess that, yes, it's probably better to use abstract mediators and to subclass them, for each group partition, just to prepare yourself for future changes without having to redesign your mediators (remember the Open-Closed principle).
Hope this helps.
I have inherited a monster.
It is masquerading as a .NET 1.1 application processes text files that conform to Healthcare Claim Payment (ANSI 835) standards, but it's a monster. The information being processed relates to healthcare claims, EOBs, and reimbursements. These files consist of records that have an identifier in the first few positions and data fields formatted according to the specs for that type of record. Some record ids are Control Segment ids, which delimit groups of records relating to a particular type of transaction.
To process a file, my little monster reads the first record, determines the kind of transaction that is about to take place, then begins to process other records based on what kind of transaction it is currently processing. To do this, it uses a nested if. Since there are a number of record types, there are a number decisions that need to be made. Each decision involves some processing and 2-3 other decisions that need to be made based on previous decisions. That means the nested if has a lot of nests. That's where my problem lies.
This one nested if is 715 lines long. Yes, that's right. Seven-Hundred-And-Fif-Teen Lines. I'm no code analysis expert, so I downloaded a couple of freeware analysis tools and came up with a McCabe Cyclomatic Complexity rating of 49. They tell me that's a pretty high number. High as in pollen count in the Atlanta area where 100 is the standard for high and the news says "Today's pollen count is 1,523". This is one of the finest examples of the Arrow Anti-Pattern I have ever been priveleged to see. At its highest, the indentation goes 15 tabs deep.
My question is, what methods would you suggest to refactor or restructure such a thing?
I have spent some time searching for ideas, but nothing has given me a good foothold. For example, substituting a guard condition for a level is one method. I have only one of those. One nest down, fourteen to go.
Perhaps there is a design pattern that could be helpful. Would Chain of Command be a way to approach this? Keep in mind that it must stay in .NET 1.1.
Thanks for any and all ideas.
I just had some legacy code at work this week that was similar (although not as dire) as what you are describing.
There is no one thing that will get you out of this. The state machine might be the final form your code takes, but thats not going to help you get there, nor should you decide on such a solution before untangling the mess you already have.
First step I would take is to write a test for the existing code. This test isn't to show that the code is correct but to make sure you have not broken something when you start refactoring. Get a big wad of data to process, feed it to the monster, and get the output. That's your litmus test. if you can do this with a code coverage tool you will see what you test does not cover. If you can, construct some artificial records that will also exercise this code, and repeat. Once you feel you have done what you can with this task, the output data becomes your expected result for your test.
Refactoring should not change the behavior of the code. Remember that. This is why you have known input and known output data sets to validate you are not going to break things. This is your safety net.
Now Refactor!
A couple things I did that i found useful:
Invert if statements
A huge problem I had was just reading the code when I couldn't find the corresponding else statement, I noticed that a lot of the blocks looked like this
if (someCondition)
{
100+ lines of code
{
...
}
}
else
{
simple statement here
}
By inverting the if I could see the simple case and then move onto the more complex block knowing what the other one already did. not a huge change, but helped me in understanding.
Extract Method
I used this a lot.Take some complex multi line block, grok it and shove it aside in it's own method. this allowed me to more easily see where there was code duplication.
Now, hopefully, you haven't broken your code (test still passes right?), and you have more readable and better understood procedural code. Look it's already improved! But that test you wrote earlier isn't really good enough... it only tells you that you a duplicating the functionality (bugs and all) of the original code, and thats only the line you had coverage on as I'm sure you would find blocks of code that you can't figure out how to hit or just cannot ever hit (I've seen both in my work).
Now the big changes where all the big name patterns come into play is when you start looking at how you can refactor this in a proper OO fashion. There is more than one way to skin this cat, and it will involve multiple patterns. Not knowing details about the format of these files you're parsing I can only toss around some helpful suggestions that may or may not be the best solutions.
Refactoring to Patterns is a great book to assist in explainging patterns that are helpful in these situations.
You're trying to eat an elephant, and there's no other way to do it but one bite at a time. Good luck.
A state machine seems like the logical place to start, and using WF if you can swing it (sounds like you can't).
You can still implement one without WF, you just have to do it yourself. However, thinking of it like a state machine from the start will probably give you a better implementation then creating a procedural monster that checks internal state on every action.
Diagram out your states, what causes a transition. The actual code to process a record should be factored out, and called when the state executes (if that particular state requires it).
So State1's execute calls your "read a record", then based on that record transitions to another state.
The next state may read multiple records and call record processing instructions, then transition back to State1.
One thing I do in these cases is to use the 'Composed Method' pattern. See Jeremy Miller's Blog Post on this subject. The basic idea is to use the refactoring tools in your IDE to extract small meaningful methods. Once you've done that, you may be able to further refactor and extract meaningful classes.
I would start with uninhibited use of Extract Method. If you don't have it in your current Visual Studio IDE, you can either get a 3rd-party addin, or load your project in a newer VS. (It'll try to upgrade your project, but you will carefully ignore those changes instead of checking them in.)
You said that you have code indented 15 levels. Start about 1/2-way out, and Extract Method. If you can come up with a good name, use it, but if you can't, extract anyway. Split in half again. You're not going for the ideal structure here; you're trying to break the code in to pieces that will fit in your brain. My brain is not very big, so I'd keep breaking & breaking until it doesn't hurt any more.
As you go, look for any new long methods that seem to be different than the rest; make these in to new classes. Just use a simple class that has only one method for now. Heck, making the method static is fine. Not because you think they're good classes, but because you are so desperate for some organization.
Check in often as you go, so you can checkpoint your work, understand the history later, be ready to do some "real work" without needing to merge, and save your teammates the hassle of hard merging.
Eventually you'll need to go back and make sure the method names are good, that the set of methods you've created make sense, clean up the new classes, etc.
If you have a highly reliable Extract Method tool, you can get away without good automated tests. (I'd trust VS in this, for example.) Otherwise, make sure you're not breaking things, or you'll end up worse than you started: with a program that doesn't work at all.
A pairing partner would be helpful here.
Judging by the description, a state machine might be the best way to deal with it. Have an enum variable to store the current state, and implement the processing as a loop over the records, with a switch or if statements to select the action to take based on the current state and the input data. You can also easily dispatch the work to separate functions based on the state using function pointers, too, if it's getting too bulky.
There was a pretty good blog post about it at Coding Horror. I've only come across this anti-pattern once, and I pretty much just followed his steps.
Sometimes I combine the state pattern with a stack.
It works well for hierarchical structures; a parent element knows what state to push onto the stack to handle a child element, but a child doesn't have to know anything about its parent. In other words, the child doesn't know what the next state is, it simply signals that it is "complete" and gets popped off the stack. This helps to decouple the states from each other by keeping dependencies uni-directional.
It works great for processing XML with a SAX parser (the content handler just pushes and pops states to change its behavior as elements are entered and exited). EDI should lend itself to this approach too.