Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
Now that I know some of the basics of C++, I must admit that I still find it very hard to deal with code that others have written in C++. This may inherently be so, as C++ allows for complex object hierarchies that are, or at least to me, very hard to grasp if one is just supplied with a C++ Project without any further comments or instructions.
So my question is more a question to the more experienced C++ programmers among you: how can someone understand a large C++ project written by others?
I easily loose my way and can be lost for weeks, if I try to understand how a large project of, for example, 10,000 lines of code is written. Functions of classes are pointers to functions of different classes that may or may not be overloaded and may or may not be inherited by other classes, etcetera, without ending.
Are there any practical tips that may speed up my ability to read and understand large C++ projects? Is there perhaps a tutorial with such tips? Please, elaborate! :)
I've been programming professionally for some time now, and as such I have repeatedly been handed down codebases written by others before me. Understanding is never easy, especially when the code is inconsistent.
The first thing to realize, though, is that learning your ways in a new codebase is not so different than re-discovering a codebase you had not touched for a while. Thus, whether written by your old-self of others does not matter much; and since you probably manage to cope with re-discovering codebases you had worked on before, you should be able to discover new codebases as well. Don't lose hope.
The second thing to realize is that understanding is a vague term, and there are certainly different degrees. Often times, nobody asks you to understand the ins and outs completely; more likely you will be asked to understand a portion of the codebase in which either there is a bug or some new functionality should be developed. Therefore, as time passes, you will gradually gain an understanding of various portions, and you will inevitably have a deeper knowledge of the portions you worked the most whilst others can be relatively abstract or even completely obscure. It's okay, it's been a long time since human beings stopped trying to learn everything there was to learn.
With that said, there are several axis of understanding you can try:
you should look for architecture: a good thing is to trace the library dependencies (the Makefile/Project should help here) this will give you the coarse technical blocks out of which the application is built. Executables are normally leaves of the dependency trees.
you should look for data-flow: what's the trigger of the application (called directly or as a callback) ? what are the steps followed by this data (roughly, just a sketch). Do not hesitate to focus on a specific narrow usecase and use the debugger to trace things, and do not try to dig too deep at first; just get a feel of things.
There are also other axis that may help gaining some understanding of the domain the application has been written for. An understanding of the domain is useful because it provides you with a key insight on what should happen and it also helps you decipher the comments/function names.
user documentation: what is this used for ? if you can arrange for a demo it is generally very helpful, otherwise maybe you can try playing with it yourself (in a test environment)
tests: what is tested ? what is exposed to the user ?
persistent data: what is serialized ? what is saved in a database ? Persistent data is accessed at some point, so it helps if you understand when it is read/written.
If it is a working product (that runs) and you can "debug" it, start by looking at just one particular feature.
Learn how it is working from the user's point of view (UI, behaviour, inputs, outputs, ...).
Once you know the feature from the outside, just look for the code for that feature (only that feature); the starting point might be a handler for a menu, or from a dialog or a mouse/pointer event.
From there; manually trace the code for one action or sub-feature; skip deep internal libraries (treat them as black box for now) and learn how it works.
Once you know that section of code, dig deeper in libraries API that was called from the upper level code.
Take your time.
Do not try to understand everything at once.
Draw up schematic (pen and paper) of the dependencies (stay high level, no class dependencies at the beginning).
Good luck.
The problem that you are mentioning does not have clear and simple answer. Nevertheless here are some tips:
At the beginning try to randomly remember everything. Names of directories, classes, params of templates, etc. As much as you can. This sounds pointless but still makes sense.
While working with the code always think "Have I looked at this function/param/etc before?" If the answer is yes, spend with this piece of code more. If not, just make basic grasp and go on.
As the time will go on, you will find out that more and more sounds clear and easier to grasp.
It is impossible to give any exact values because size and complexity of projects vary greatly. Do not expect simple and immediate results.
Other points:
You definitely need a source code browser. Spend time in learning how to use it. Good example is http://sourceinsight.com/. This is not my site!!! I do have my own site. I will not mention it here.
If you see a function that is called 500 times, it is 500 times more likely that knowledge about this function will be useful comparing with a function, that is called only once.
The best is to grasp the architecture of the project. Trying to do this it is necessary to remember that project may have no architecture at all.
Studying the code you should remember your task. Typical situation - you need to modify something or fix a bug. If this is so look for the right part of the code and focus your effort on it.
I am thinking about the following scheduling problem:
I have X people.
I have Y meeting slots with Z meeting roles available in every meeting.
For some roles, same person may combine two of them in a single meeting, but most are one person = one role.
For each person x in X, I know a set of facts about them:
a) The last date they attended the meeting and had a specific role (historical);
b) Their availability for any meeting y in Y;
c) Their specific preference for the roles z in Z or a set of roles (no specific dates) for the group of meetings.
I'd like to build a scheduler with the following objectives in mind:
a) All meeting roles are filled.
b) Preferences are accommodated if possible;
c) Distribution of people / roles should be uniform (i.e. if one person is scheduled every meeting and other just for one meeting once in a while -- it's unacceptable; if one person is scheduled for the same role over, and over, and over again -- it's unacceptable).
Now, I have a gut feeling that the task is not easy at all :), so my specific questions are:
What language would be better suited for the task (somehow I feel Prolog can deal with it, but I am not entirely sure).
What is the proper approach to solve this task and how close can I get to my objectives in #4 above?
Any good read on the kind of problem I am looking to solve?
Thank you!
P.S. If you are curious, the use case is scheduling a roster for a set of Toastmasters meeting (example) (I am lazy do it by hand and I'd like computer to help me in this task at least partially).
A rule engine, like Drools Expert or Prolog is good for defining the constraints (= score function). However it's terrible at finding the best solution.
Since your problem is probably NP complete (especially if the meetings need to be put into a timeslot and/or 1 person can't attend 2 meetings at the same time), you need to use a planning optimization algorithm on top of that, such as construction heuristics and metaheuristics. Take a look at the curriculum course example in Drools Planner (java, open source, ASL).
From my point of view, the language you are going to program in doesn't really matter that much: for simple problems the language to use is more of a personal preference instead of an exact science. If you like/want to learn Python, use that. If you "feel like" Prolog today, use that.
What will be a factor in your choice though is how you want to preserve and present your data. From your question it can be told that you need the following:
A database (or at least, a persistent resource) to store your available participants and roles, past and future meetings storing the roles for every participant, and some way to schedule availability.
Some way to present your data (command line, GUI, or website).
Some business logic that describes the way of assigning roles, criteria for the attendance and such.
You will want to use some third-party components for most of these, since your time is to be spent on the added value of your product; creating a shiny ORM or GUI toolkit is not your goal in this. So the programming language you will choose should have a proper support for these items (especially the first two). I can't say it for Prolog, but Python will have you fully covered in these areas. I think it goes beyond the scope of this question to suggest specific toolkits, so I'll leave it at that for now.
After this step, you analyze your problem, which you seem to have done quite nicely already. So, start implementing it. To be able to verify your specific use cases, it sounds like you could benefit from some Test or Behavior Driven Design, so you may want to read up on that.
For learning the language, just search StackOverflow for "[language] tutorial": there are already plenty of answers linking to very nice resources for getting started with any language you will choose.
Final advice: perseverance is the hardest part, so try to set yourself some goals or milestones, or try to involve other people in one way or another. That way you'll enlarge the possibility of following through with creating a nice piece of software.
Even though I'm a Python fan, I'd hardly suggest Prolog for this task. I'm familiar with Prolog, and it's definitely nicer solved with Prolog. But it depends on how you will use that program. Your choice - decide whether the installation of Python or Prolog is easier for you (if you just run it on your local PC, it doesn't matter that much I guess), or on other requirements you have.
It's farly simple with Prolog, if you know about Prolog. After you learnt Prolog, you can solve it with some thinking without much problems I guess (if you really understood Prolog!).
Basicly you should start with Prolog of course. I'd suggest to use SWI-Prolog, it's one of the most common Prolog Implementations used. Also, there is a nice tutorial for it: http://www.learnprolognow.org/
It seems to me, but I'm not 100% sure, that you are not familiar with Prolog yet. You need the time to learn Prolog first, so it also depends on how fast you need to have your program. It's possible to get through the Tutorial in less than a month, as far as I remember. Of course this hardly depends on how much time you invest per day - you can do it in less or even more time.
Prolog is based on rules. Every of your requirement can be expressed as a rule. After you have your set of rules, you can ask, which combination (of persons and meeting room) conform to all those rules. For the historical data of the different persons, you could use a small database.
This sounds like an optimization problem and I agree with Geoffrey that it would be a NP Complete problem. I recently developed a scheduling algorithm for a university that does final exam scheduling. I used a genetic algorithm with domain specific heuristics to solve that problem. My implementation performed nicely with a student count of 3000 + and course count of 500, it took about 2 hours to find a near optimal solution.
I agree with people who suggest Prolog for this task; I would suggest to take a look
at ECLiPSe (it is, besides being a Prolog implementation, a constraint programming
language which have more powerful problem solving capabilities than just Prolog).
ECLiPSe has now a very nice introduction, with many examples and very to the point,
with a free pdf, written by Antoni Niederlinski:
http://www.anclp.pl/
Among the examples on ECLiPSe site, I found the following which seems to be relevant: http://eclipseclp.org/examples/roster.ecl.txt.
ECLiPSe is thoroughly documented and, according to this documentation,
can be also integerated with C++/Java.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
When having a new C++ project passed along to you, what is the standard way of stepping through it and becoming acquainted with the entire codebase? Do you just start at the top file and start reading through all x-hundred files? Do you use a tool to generate information for you? If so, which tool?
I use change requests/bug reports to guide my learning of some new project. It never makes a lot of sense to me to try and consume the entirety of something all at once. A change order or bug report gives me guidance to focus on this one tendril of the system, tracing it's activity through the code.
After a reasonable amount of these, I can get a good understanding of the fundamentals of the project.
Here's my general process:
Start by understanding what the application does, and how its used. (I see way too many developers completely skip this critical step.)
Search for any developer documentation related to the project. (However, realize this will nearly always be wrong and out of date - it just will have helpful clues.)
Try to figure out the logic in the organization. How is the main architecture defined? What large scale patterns are used? (ie: MVC, MVP, IoC, etc)
Try to figure out the main classes related to the "large" objects in the project. This helps for the point above.
Slowly start refactoring and cleaning up as you try to maintain the project.
Usually, that will get me at least somewhat up to speed. However, usually I end up given a project like this because something has to be fixed or enhanced, and timing isn't always realistic, in which case I often just have to jump in and pray.
Start working on it, perhaps by
adding a small feature.
Step through application startup in the debugger.
You could try running it through doxygen to at last give a browsable set of documentation - but basically the only way is a debugger, some trace/std::cerr messages and a lot of coffee.
The suggestion to write test cases is the basis of Working-Effectively-Legacy-code and the point of the cppunit test library. If you can take this approach depends on your team and your setup - if you are the new junior you can't really rewrite the app to support testing.
Try writing unit tests for the various classes.
There is one tool I know about that may help you, it's currently in beta called CppDepend that will help you understand the relation between the classes and the projects in the solution.
Other than that you can try to understand the code by reading it:
Start with the header (.h/.hpp) files, reading them would help understand the "interfaces" between the classes
If the solution has several project try to understand the responsibility of each project.
Find someone who is familiar with the project that could give you and overview, 5 min with the right person can save you an hour with the debugger
Understanding how the code is used is usually very helpful.
If this is a library, look at client code and unit tests. If there aren't any unit tests, write some.
If this is an application, understand how it works - in detail. Again read & write unit tests.
Essentially, it's all about the interfaces. Understand the the interfaces and you'll go a long way towards understanding how the code works. By interface, I mean, the API if it's a library, the UI if it's a graphical application, the content of the inbound & outbound messages if it's a server.
Firstly how large is large?
I don't think you can answer this without knowing the other half of the scenario. What is the requirement for changing the code?
Are you just supporting/fixing it when it goes wrong? Developing new functionality? Porting the code to a new platform? Upgrading the code for a new C++ compiler?
Depending on what your requirement is I would start in different ways.
Here's how I approach the problem
Start by fixing easy bugs. Do extreme dilligance on these bugs and use the debugger heavily to find the problem
Code review every change that goes into the system. On an unbelievably large system, pick a smaller subset and review all of these changes
And most importantly: Ask a lot of questions!
Things to do:
Look at what the sales brochure tells you it does, set the scope of your expectations
Install it, what options do you have in the installer, read the quick start/install guide
Find out what it does, does it even execute, do you have multiple executables
Is there a developer setup guide/wiki, pointers to VCS
Get the code and make your build environment work, document SDKs, build tools you need if it isn't already
Look at the build process, project dependancies, is there a build machine/CI service
Look at generated doc output (if there is any!)
Find an interesting piece of the solution and see how it works, what are the entry points/ how does it work/look for main classes and interfaces
Replicate bugs, stop at interesting features in the program to get an overview and work down to tracing code.
Start to fix things, but ensure you are fixing things by having appropriate unit tests to show that it is broken now and when it will be fixed.
I have been incorporating source codes from some mid-sized projects. The most important lesson I learn from this process is before going into the source codes, you must be sure what part of the source codes interest you most. You should then go into that piece by grepping logging/warning messages or looking at class/function names. In understanding the source codes, you should run it in a debugger or insert your own warning messages. In all, you should focus on things you are interested in. The last thing you want is to read all the source codes.
Try generating a documentation using Doxygen or something similar if it wasn't done already.
Walk through the API and see if there is something that is unclear to you and look at the code, if you still don't get it ask a developer who already worked on it before.
Always examine whatever you have to work on first.
Take a look at whatever UML documents you've got, if you don't have any:
Smack the developer/s who worked on it. It's a shame they didn't do something as basic as UML class diagrams.
Try to generate them from the code. They will not be accurate but the they will give you a head start.
If there is something specific that you don't understand or think is wrong, ask the team who developed it. They will probably know better.
Fixing bugs works just fine for any project, not just c++ one.
Browse around in the file hierarchy with Total Commander, try getting an overview of the structure. Try identify where the main header files are located. Also find the file where the main() function is located.
Ask a person who is already familiar with the codebase to outline the basic concepts that were used during development.
He doesn't need to explain every detail, but should give you a rough idea of how the software works and how the individual modules are connected with each other.
Additionally, what I've found useful in the past was to first setup a working development environment before starting to think about the code.
Read the documentation. If possible, speak with the former maintainer. Then, check out the code bases from the first commit and the first release from the VCS and spend some time looking at them. Don't go for full understanding yet, just skim and understand which are the major components and what they do. Then read the change logs and the release notes for each of the major releases. Then start breaking everything and see what breaks what. Do some bug fixes. Review the test suite and understand which component each test is focused on. Add some tests. Step through the code in a debugger. Repeat.
As already said, grab doxygen and build HTML documentation for source code.
If code is well-designed, you'll easily see a nice class hierarchy, clear call graphs and many other things that otherwise would take ages to uncover. When certain parts behavior appears unclear, look at the unit tests or write your own.
However, if the structure appears to be flat, or messy, or both together, you may find yourself in some sort of trouble.
I'm not sure there is a standard way. There are some for-pay tools that will do C++ class diagrams/call graphs and provide some kind of code-level view. doxygen is a good free one. My low-tech approach is to find the top-level file and start to sort through what it provides and how...taking notes if needed.
In C++, the most common problem is that a lot of energy and time is wasted on low level tasks, such as "memory management".
Things that are no - brainers in managed languages are a pain to do in C++.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
When in school it was often a requirement to flowchart the little programs that we wrote line for line.
Those flow charts tended, due to the size of the pictures, to be very large and were often tedious to draw.
It was always to such detail that you were essentially writing code anyway.
I use flowchart/UML style techniques to develop higher level things but when it gets down to actual loops and what not it seems like overkill.
I will often pseudo-code more detailed algorithms but still not to the super fine grained point.
Is this just one of those things where in school things were so tiny there would be nothing else to 'Flow-Chart' so they had use do the minutia?
Flowcharts, no -- Sequence diagrams, yes. I try to keep them at a very high-level to communicate the idea to someone quickly. I do not try to get every detail in. I might supplement with another diagram to show an edge case, if it seems important.
It's great for communication as a sketch -- I think it's not right for a specification (but would be a good intro to a detailed section)
Flow charts for the ifs and whiles of real code, never (in 30 years) found useful.
As a discussion aid for elicitng requirements ... so when we're here, what could happen? ... how would you decide ... what would you do if it's > 95% ... Can be helpful. A certain kin d of user finds such diagrams on the whiteboard easy to talk about.
To be absolutely honest, I am extremely glad I was required to accompany any assignment with flowcharts. They made me think structurally, something I was lacking (and, perhaps, still lacking to a certain extent).
So don't be quick to jump on a "I'm off to play the grand piano" bandwagon, flowcharts really do work.
Not once did I find myself in a bit of a bind in non-trivial logic. After laying out the logic in the flowchart form on a sheet of paper (takes a couple of minutes), it all inevitably becomes clear to me.
Yeah, I agree. The point was to get you to understand flow charts, not to imply that you should use them for line-by-line code coverage.
I don't know why you'd even waste time with pseudo-code except for demonstration, honestly, unless it's some really low-level programming.
I don't use them myself. But a coworker who only programs every once in a while does. It's a very handy way for him to remember the nuts-and-bolts of that program he wrote a year ago. He doesn't do much programming, so it's not worth it for him to learn sequence diagrams and things like that.
It's also the type of diagram that other pple that hardly ever program will be able to read easily. Which in his position is a plus.
I don't find great detail in a flowchart to be very helpful. I use UML-style techniques on a sheet of paper. Use a whiteboard in a group setting. Mid-level class diagrams and sequence diagrams can be extremely helpful to organize your ideas and communicate your design intentions.
Sometimes on a whiteboard to describe a process, but never in actual design or documentation. I'd describe them more as "flowchart-like" since I'm not always particular about the shapes.
We have an in-house application that has some fairly complex workflow in it. A flowchart forms a big part of the spec of this part of the system. So yes, Flowcharts are a useful tool for spec'ing a system. They are also normally understood by non-technical people which is useful if they are part of a user requirements. No, I would not normally use them at a very low level, nor would I expect part of a system to only be spec'ed or documented by flowcharts.
TDD is a good option for nuts and bolts code, if you are so inclined, and it comes with a lot of other benefits.
We have a large organically grown application that has very little documentation, so using flow charts to document components within the application has proven very usefull to the business side of the operation, as even they don't understand how everything fits together.
They aren't at line by line level, but do cover all branches of business logic with the subsequent processing and outputs (although not following strict flow chart rules - some blocks describe multiple processes).
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
As I've increasingly absorbed Agile thinking into the way I work, yagni ("you aren't going to need it") seems to become more and more important. It seems to me to be one of the most effective rules for filtering out misguided priorities and deciding what not to work on next.
Yet yagni seems to be a concept that is barely whispered about here at SO. I ran the obligatory search, and it only shows up in one question title - and then in a secondary role.
Why is this? Am I overestimating its importance?
Disclaimer. To preempt the responses I'm sure I'll get in objection, let me emphasize that yagni is the opposite of quick-and-dirty. It encourages you to focus your precious time and effort on getting the parts you DO need right.
Here are some off-the-top ongoing questions one might ask.
Are my Unit Tests selected based on user requirements, or framework structure?
Am I installing (and testing and maintaining) Unit Tests that are only there because they fall out of the framework?
How much of the code generated by my framework have I never looked at (but still might bite me one day, even though yagni)?
How much time am I spending working on my tools rather than the user's problem?
When pair-programming, the observer's role value often lies in "yagni".
Do you use a CRUD tool? Does it allow (nay, encourage) you to use it as an _RU_ tool, or a C__D tool, or are you creating four pieces of code (plus four unit tests) when you only need one or two?
TDD has subsumed YAGNI in a way. If you do TDD properly, that is, only write those tests that result in required functionality, then develop the simplest code to pass the test, then you are following the YAGNI principle by default. In my experience, it is only when I get outside the TDD box and start writing code before tests, tests for things that I don't really need, or code that is more than the simplest possible way to pass the test that I violate YAGNI.
In my experience the latter is my most common faux pas when doing TDD -- I tend to jump ahead and start writing code to pass the next test. That often results in me compromising the remaining tests by having a preconceived idea based on my code rather than the requirements of what needs to be tested.
YMMV.
Yagni and KISS (keep it simple, stupid) are essentially the same principle. Unfortunately, I see KISS mentioned about as often as I see "yagni".
In my part of the wilderness, the most common cause of project delays and failures is poor execution of unnecessary components, so I agree with your basic sentiment.
The freedom to change drives YAGNI. In a waterfall project, the mantra is control scope. Scope is controlled by establishing a contract with the customer. Consequently, the customer stuffs all they can think of in the scope document knowing that changes to scope will be difficult once the contract has been signed. As a result, you end up with applications that has a laundry list of features, not a set of features that have value.
With an agile project, the product owner builds a prioritized product backlog. The development team builds features based on priority i.e., value. As a result, the most important stuff get built first. You end up with an application that has features that are valued by the users. The stuff that is not important falls off the list or doesn't get done. That is YAGNI.
While YAGNI is not a practice, it is a result of the prioritized backlog list. The business partner values the flexibility afforded the business given that they can change and reprioritized the product backlog from iteration to iteration. It is enough to explain that YAGNI is the benefit gained when we readily accept change, even late in the process.
The problem I find is that people tend to bucket even writing factories, using DI containers (unless you've already have that in your codebase) under YAGNI. I agree with JB King there. For many people I've worked with YAGNI seems to be the license to cut corners / to write sloppy code.
For example, I was writing a PinPad API for abstracting multiple models/manufacturers' PINPad. I found unless I've the overall structure, I can't write even my Unit Tests. May be I'm not a very seasoned practioner of TDD. I'm sure there'll be differing opinions on whether what I did is YAGNI or not.
I have seen a lot of posts on SO referencing premature optimization which is a form of yagni, or at least ydniy (you don't need it yet).
I don't see YAGNI as the opposite of quick-and-dirty, really. It is doing just what is needed and no more and not planning like the software someone writes has to last 50 years. It may come rarely because there aren't really that many questions to ask around it, at least to my mind. Similar to the "don't repeat yourself" and "keep it simple, stupid" rules that become common but aren't necessarily dissected and analyzed in 101 ways. Some things are simple enough that it is usually gotten soon after doing a little practice. Some things get developed behind the scenes and if you turn around and look you may notice them may be another way to state things.