As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Note: For this question I will mainly be referring to C++, however this may apply to other languages.
Note: Please assume there is no recursion.
People often say (if you have an exceptionally large function) to "break up" a function into several smaller
functions, but is this logical? What if I know for a fact that I will never use one of those smaller functions,
that is just a waste of: memory, performance, and you may have to jump around the code more when reading it.
Also what if you are only going to use a (hypothetically large) function once, should you just insert the
function body into the place where it would be called (for the same reasons as last time i.e: memory, performance, and you may have to jump around the code more when reading it)? So... to make a function or not to make a function, that is the question.
TO ALL
*EDIT*
I am still going through all the answers, however from what I have read so far I have formed a hypothesis.
Would it be correct to say split it up functions during development, but do what I suggest in the question before deployment, along with making functions you use once in development, but inserting bodies before deployment?
This really depends on the context.
When we say the size of a function, we actually mean the semantic distance of lines inside the function. We prefer that one function should do only one thing. If your function only does one thing and semantic distance is small inside it, then it is OK to have large function.
However, it is not good practice to make a function do a lot of things and it is better to refactor such functions to a few smaller ones with good naming and good placement of codes, such that the user of the code does not need to jump around.
Don't worry too much about performance and memory. Your compiler should take care of the bulk of that for you, especially for very thin functions.
My goal is typically to ensure that the given function call can be replaced entirely in the reader's memory--the developer can treat the abstraction purely. Take this:
// Imagine here that these are real variable/function names as written by a
// lazy coder. I have seen code like this in the wild.
void someFunc(int arg1, int arg2) {
int val3 = doFirstPart(arg1, field1);
int val4 = doSecondPart(arg2, val3);
queue.push(val4);
}
The refactoring of doFirstPart and doSecondPart buys you very little, and likely makes things harder to understand. The problem here isn't method extraction, though: The problem is poor naming and abstraction! You will have to read doFirstPart and doSecondPart or the point of the whole function is lost.
Consider this, instead:
void pushLatestRateAndValue(int rate, int value) {
int rateIndex = calculateRateIndex(rate, latestRateTable);
int valueIndex = caludateValueIndex(rateIndex, value);
queue.push(valueIndex);
}
In this contrived example, you don't have to read calculateRateIndex or calculateValueIndex unless you really want to dig deep--you know exactly what it does just by reading it.
Aside from that, it may be a matter of personal style. I know that some coders prefer to extract every business "statement" into a different function, but I find that a little hard to read. My personal preference is to look for an opportunity to extract a function from any function longer than one "screenful" (~25 lines) which has the advantage of keeping the entire function visible at once, and also because 25 lines happens to my personal mental limit of short-term memory and temporary understanding.
There are many good arguments for not making a routine longer than roughly what will fit on one page. One that most people don't always think about is that - unless you deploy debug symbols, which most people don't do - that stack trace coming in from the field is a lot easier to analyze and turn into a hypothesis about a cause when the routines that it refers to are small than when the error turn out to be occuring somewhere in that 2,000-line whale of a method that you never got around to split up.
Related
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
The question of understanding a large code has previously been well answered. But I feel I should ask this question again to ask the problems I have been facing.
I have just started a student job. I am a beginner programmer and just learned about classes two months back. At the job though, I have been handed a code that is part of a big software. I understand what that code is supposed to do (to read a file). But after spending a few weeks trying to understand the code and modify it to achieve our desired results, I have come to the conclusion that I need to understand each line of that code. The code is about 1300 lines.
Now when i start reading the code, I find that, for example, a variable is defined as:
VarType VarName
Now VarType is not a type like int or float. It is a user defined type so i have to go the class to see what this type is.
In the next line, I see a function being called, like points.interpolate(x);
Now i have to go into another class and see what the interpolate function does.
This happens a lot which means even if I try to understand a small part of the code, I have to go to 3 or 4 different classes and keep them in mind all at one time without losing the main objective and that is tough.
I may not be a skilled programmer but I want to be able to do this. Can I have some suggestions how i should approach this?
Also (I will sound really stupid when I ask this) what is a debugger? I hope this gives you an idea of where I stand (and the need to ask this question again). :(
With any luck, those functions and classes should have at least some documentation to describe what they do. You do not need to do know how they work to understand what they do. When you see the use of interpolate, don't start looking at how it works, otherwise you end up in a deep depth-first-search through the code base. Instead, read its documentation, and that should tell you everything you need to know to understand the code that uses it.
If there is no documentation, I feel for you. I can suggest two tips:
Make general assumptions about what a function or class will do from its name, return type and arguments and the surrounding code that uses it until something happens that contradicts those assumptions. I can make a pretty good guess about what interpolate does without reading how it works. This only works when the names of the functions or classes are sufficiently self-documenting.
If you need a deep understanding of how some code works, start from the bottom and work upwards. Doing this means that you won't end up having to remember where you were in some high level code as you search through the code base. Get a good understanding of the low level fundamental classes before you attempt to understand the high level application of those types.
This also means that you will understand the functions and classes in a generic sense, rather than in the context of the code that led you to them. When you find points.interpolate(x), instead of wondering what interpolate does to these specific points with this specific x argument, find out what it does in general. Later, you will be able to apply your new-found knowledge to any code that uses the same function.
Nonetheless, I wouldn't worry about 1300 lines of code. That's basically a small project. It's only larger than examples and college assignments. If you take these tips into account, that amount of code should be easily manageable.
A debugger is a program that helps you debug your code. Common features of debuggers allow you to step through your code line-by-line and watch as the values of variables change. You can also set up breakpoints in your code that are of interest and the debugger will let you know when it's hit them. Some debuggers even let you change code while executing. There are many different debuggers that all have different sets of features.
Try making assumptions about what the code does based on its title. For example, assume that the interpolate function correctly interpolates your point; only go digging in that bit of code if the output looks suspicious.
First, consider getting an editor/IDE that has the following features:
parens/brackets/braces matching
collapsing/uncollapsing of blocks of code between curly braces
type highlighting (in tooltips)
macro expansion (in tooltips or in a separate window/panel)
function prototype expansion (in tooltips or in a separate window/panel)
quick navigation to types, functions and classes and back
opening the same file in multiple windows/panels at different positions
search for all mentions/uses of a specific type, variable, function or class and presentation of that as a list
call tree/graph construction/navigation
regex search in addition to simple search
bookmarks?
Source Insight is one of such tools. There must be others.
Second, consider annotating the code as you go through it. While doing this, note (write down) the following:
invariants (what's always true or must always be true)
assumptions (what may not be true, e.g. missing checks/validations or unwarranted expectations), think "what if"
objectives (the what) of a piece of code
peculiarities/details of implementation (the how; e.g. whether exceptions are thrown and which, which error codes are returned and when)
a simplified call tree/graph to see the code flow
do the same for data flow
Draw diagrams (in ASCII or on paper/board); I sometimes photograph my papers or the board. Specifically, draw block diagrams and state machines.
Work with code at different levels of abstraction/detail. Zoom in to see the details, zoom out to see the structure. Collapse/uncollapse blocks of code and branches of the call tree/graph.
Also, have a checklist of what you are going to do. Check the items you've done. Add more as necessary. Assign priorities to work items, if it's appropriate.
A debugger is a program that lets you execute your program step by step and examine its state (variables). It also lets you modify the state and that may be useful at times too.
You may use a debugger to understand your code if you're not very well familiar with it or with the programming language.
Another thing that may come in handy is writing tests or input data test sets for your program. They may reveal problems and limitations in terms of logic and performance.
Also, don't neglect documentation and people! If there's something or someone that can give you more information about the project/code, use that something or someone. Ask for advice.
I know this sounds like a lot, but you'll end up doing some of this at some point anyway. Just wait for a big enough project. :)
You may basically needs to understand what is the functionality of a function being called at first, then understand what is input and output to that function, for example, if you really needs to understand how interpolate is done, you can then go to the details. Usually, the name of the functions are self-explainable, you can get a feeling about what the function does from its name if the code is well written.
Another thing you may want to try is to run some toy examples to go through the code, you can use some of the debuggers or IDE that can help you navigate through the code. Understanding large-scale code takes time and experience, just be patience.
"Try the Debugger Approach"
[Update : A debugger is a special program that lets you pause a running program to examine the state of program (Variable Values/Which function is running/Who is the parent function etc.,)]
The way I do it is by Step Debugging the code, for the usecase I want to understand.
If you are using an Advanced/Mordern IDE then setting breakpoints at the entry point (like main() or a point of interest) is fairly easy. And from there on just enter into the function you want to examine or overstep the function.
To give you a step by step approach
Setup a break point in the main() methods (entry points) starting expression.
Run the program with debugging active
The program will break at the break point.
Now, if step over until you come across a function/expression that seems interesting. (say, your points.interpolate(x); ) function
Step into the function, and examine the program state like the variables and function stack, in live.
Avoid complex system Libraries. Just Step over/Step out. (Example: Avoid something like MathLib.boringComputaion() )
Repeat until the program exits.
I found out that this way of learning is very rapid and gives you a quick understanding of any complex/large piece of software.
Use Eclipse, or if you cant then try GDB if its C/C++. Every popular programming language has a decent Debugger.
Understand the basic debugging operations like will be a benifit:
Setting-up a breakpoint.
Stopping at a breakpoint.
Examine/Watch Variables.
Examine Function Stack (the hierarchy of function calls)
Single-Step - Stepping to next Line in Code.
Step-Into a function.
Step-Out of a function.
Step-over a function.
Jumping to the next breakpoint (point of interest).
Hope, it helps!
Many great answer have already been given. I thought to add my understanding as a former student (not too long ago) and what I learned to help me understand code. This particularly helped me because I began a project to convert a database I wrote in Java many years ago to c++.
1. **Code Reading** - Do not underestimate this important task. The ability to write code
does not always translate into the ability to read it -- and reading it can be more
frustrating than writing it.
Take your time and carefully discover what each line of the codes does. This will certainly help you avoid making assumptions unless you come across code that you are familiar with and can gloss over it.
2. Don't hesitate to follow references, locate declarations, and uncover definitions of
code elements you are reading. What you learn about how a particular variable,
method call, or class are defined all contribute to learning and ultimately to you
being able to perform your task.
This is particularly important because detective, and effective detective work, are essential parts of being bale to understand the small parts of the code so that you can, in the future, grasp the larger parts with less difficulty.
Others have already posted information about what a debugger is and you will find it is an invaluable asset at tracking down code errors and, I think, helps with code reading, knowledge gain, and understanding so you can be a successful programmer.
Here is a link to a debugger tutorial utilizing Visual Studio and may give you a strong understanding of at least the process at hand.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
This is just a general help question, I'm trying to know What is the advantage of having a set of small functions in a C++ application's code over having one long complex function containing all the statements necessary to solve a problem ?
Edit Credit for this mnemonic goes to the commenters in the OP.
Breaking big functions up in to several smaller ones can lead to MURDER! Which, in this case, might be a good thing. :)
M - Maintainability. Smaller, simpler functions are easier to maintain.
U - Understandability. Simpler functions are easier to understand.
R - Reuseability. Encourages code reuse by moving common operations to a separate function.
D - Debugability. It's easier to debug simple functions than complex ones.
E - Extensibility. Code reuse and maintainability lead to functions that are easier to refactor in 6 months.
R - Regression. Reuse and modularization lead to more effective regression testing.
There are a few potential benefits to breaking big functions up in to smaller functions. In the order they fell out of my brain:
It encourages code-reuse. Often in large functions you have to do more-or-less the same thing many times. By generalizing this in to a single common function, you can use that one block of code in multiple places.
Code-reuse can aid in robustness and maintainability by isolating potential bugs to one place rather than several.
It is easier to understand the semantics of the function when there are fewer lines of code and a bunch of calls to well-named functions.
If you are opposed to functions with multiple return points, breaking big functions up can help reduce them.
It helps to identify and isolate (potential) problems with subtle data dependencies that are otherwise hard to notice.
It's important to note however that you take the good with the bad with this. There are also a few potential drawbacks to breaking big functions up:
If the big function worked before, trying to modularize it may create defects.
In multithreadded applications, you might introduce deadlocks and race conditions if your synchronization policies are subtle or just plain wrong.
You might introduce a performance hit from the function calls.
Clearer code which means it's easier to understand and maintain.
One big complex function is just that: complex.
Dividing your code into separate functions makes your code much easier to work with. First, when you look for the part of code that performs a particular task, it will be easier to find if it's in its own function.
Second, making changes to a function is much easier when it's simple--you don't need to understand a large amount of code to modify that function.
Also, you may find it easier to reuse that code in another project when it is divided up into smaller functions that can likely be used for more purposes than just the single large function could be.
The advantages of splitting a program up in multiple functions are:
Most programs have some basic functionality that is needed in multiple places. Having that functionality in a separate function means it is more obvious when that same functionality is used and you also only have to fix problems with it only once.
By splitting a program in functions, you can introduce multiple levels of abstraction in the code. In the main function you get a broad overview of what the program does and each level down on the call-tree reveals more details how certain aspects are realized.
In medical device systems, breaking code into smaller pieces reduces the need for regression testing and narrows the effects of change to a smaller scope.
For example, let's assume we have 15 functions for 3 themes in one file.
If I change one of the functions in the file, everything needs to be rebuilt and retested.
If I split the file into 3 separate files of 5 functions each, I only need to rebuild 5 functions and retest 5 functions. Testing 5 functions requires less testing time than 15 functions.
Also, when teams of people are working on the same code base, dividing the code reduces the probability of two or more people working on the same file. Multiple people working on the same file has many conflicts such as one person's code being accidentally removed during a check-in.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 12 years ago.
I know good programming practices always help in the "long run" for a project, but sometimes they just seem to cost a lot of time. For instance, its suggested that I maintain a header file and a cpp file for each class that I make, keep only the declarations in the headers while definitions in cpp. Even with 10-12 classes, this process becomes very cumbersome. Updating the makefile each time a new class is added dependecies and evthing .. takes a lot of time...
While I am busy doing all this, others would just write evthing in a single fie, issue a single compile command and run their programs ... why should I also not do the same? Its fast and it works?
Even trying to come up with short, meaningful names for variables and functions takes a lot of time, otherwise you end up typing 30 character long names, completely unmanagable without auto complete
Edit :
Okay let me put it a little differently : Lets say i am working on a small-medium size project, that is never going to require any maintenance by a different developer (or even me). Its basically a one time development. Are programming practices worth following in such a case. I guess my question is, do the good programming practices actually help during the development, or they just pay off during maintenance ?
I haven't been working in the field for long, but not slacking off and documenting as I go, defining variables with useful names, etc...definitely saves time in the long run. It saves time when the other developers/myself go back to the code for maintenance.
Not stuck wondering what this variable means, why did I do that, etc! :)
Laziness may pay off right now, but it will only pay off once. Taking the time to do it right doesn't pay off immediately, but it will do so multiple times and for a longer period of time.
Also, there is nothing wrong with really long variable and method names, unless you subscribe to the naive view that most of the time you spend programming is used on typing and not solving problems.
Addendum: If it is hard to name succinctly, it probably needs to be broken down into more modular units. Method or variables that are hard to name is a definite code smell.
Its all about long term supportability. Clearly you have either not been coding on a team or not had to look at code you have written years ago. I can open code I have written 15 years ago and modify it with very small relearning curves if I have given meaningful variable names, while if I have not it will take some time to figure out what I was doing with that X and that H, and why T should not be more than 4.
Try sharing code with 10 people on a team and have each of them just put code in any place they like... I have worked with people like that. If lynchings still had public support, I would have lead many. Picture this... I know I need to modify the signature on Foo.SetFoos(int FoosInFooVille), but I looked for Foo.h and it was not found, well now I just look for Foo.cpp right? Oops, to save... time?... they jammed Foo.cpp into Chew.cpp... so I look there... its not at the top of the file! Do I find Foo in that file and see if its above that... sure... nope, not found... its in Chew.h. Now I am ready to check the SVN log and target my USB powered missile launcher at that jerk next time he passes by.
The downside of the ad-hoc is in the long-run, when it comes to maintenance (especially when the maintenance coders are people other than yourself). Such techniques might be OK for quickie proof-of-concepts, but will cause more problems in the future if you don't rebuild properly.
Yes, it's worth doing it "right" ie good, because, basically it's pay me now or pay me later, and, you're not the only person who will ever see the code.
If it takes you 15 minutes now to do it "good" - how long does it take you 6 months (or more) from now to figure out what was meant - in your own code!
Now, you could use Martin Fowler's 3 strikes idea for refactoring.
First time in the code to fix some thing , notice it could be refactored, but you're too busy and let it go. Second time back in the same code, same thing. Third time: refactor the code.
The effectiveness of programming practices doesn't seem to be your problem, here. What you should be concerned about are the tools you're using to develop. There are plenty of IDE's and other options for keeping your make files automatically up-to-date, for example.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
I am developing a project in C++. I realised that my program is not OO.
I have a main.cpp, and several headers for different purposes. Each header is basically a collection of related functions with some global variables to retain data. I also have a windowing.h for managing windows. This contains the winMain() and winProc(). It calls the functions that resides in my main.cpp when events happen (like clicking a button) or when it needs information (like 'how big to make this window?'). These functions are declared in a seperate .h file included into windowing.h.
Is it worth changing this to be OO? Is it worth the work. Is there any better way I can construct the program without too many changes?
All feedback welcome, thankyou for taking the time to read this.
No, I think if it ain't broke, don't fix it.
Any windowing system is inherently OO to a degree. You have a handle to a window managed by the OS, and you can perform certain operations on it. Whether you use window->resize() or resize(window) is immaterial. There is clearly no value in such syntactic rearrangement.
However, as the application grows, you will likely find that many windows are mostly similar and subtly different. The best implementation is boilerplate basic functionality with special functions attached. And the way to do that is with a base class and polymorphism.
So, if you can refactor the program to be more elegant with OO, go for it. If it grows into the OO paradigm with natural evolution, follow best practices and let it be so. But don't just try to be buzzword-compliant.
Two things you need to think about: cost/benefit analysis and opportunity cost.
What is the cost of changing your code to be OO? What is the benefit? If the latter outweighs the former, then I'd tend towards changing it.
Costs include traditional costs such as time spent, money spent and so on. Benefits include a cleaner implementation, leading to easier maintenance in future. Whatever other costs and benefits there are depend really upon your own situation.
But one thing often overlooked is the opportunity cost. This is a cost that should be factored in to your analysis. It's an economic term meaning foregone opportunities.
In other words, if you do convert your code, your cost includes your inability to do something else with that time.
Classic example. If you do the conversion and a customer decides to not buy your software because you're not adding the feature they want, that lost sales opportunity is a cost.
It depends on what you want to accomplish with the project. If not using the OO features of C++ works for you and there are no good reasons to change, then keep going the way you're going. If, on the other hand, you would like to learn more about OOP and you have the time to apply to it, refactoring it to be more OO style will provide you with a great learning opportunity.
I would follow the best practices for working with whatever window manager you are using. Most use an OO style, which you'll automatically inherit (!) as you follow its usage patterns.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Can some one please tell me an approach for finding security flaws in a given code. For ex: in a given socket program. Any good examples or good book recommendations are welcome.
Thanks & Regards,
Mousey
The lowest hanging fruit in this category would be to simply search the source for functions which are commonly misused or are difficult use safely such as:
strcpy
strcat
sprintf
gets
then start looking at ones that are not inherintly too bad, but could be misused. Particularly anything that writes to a buffer can potentially be hazardous if misused.
memcpy
memmove
recv/read
send/write
the entire printf family should always have a constant for the format string
NOTE: all of these (except gets) can be used correctly, so don't think it's a flaw just because the function is used, instead take a look at how it is used. Also note that gets is always a flaw.
NOTE2: this list is not exhaustive, do a little research about commonly misused functions and how they can be avoided.
As far as tools, I recommend things like valgrind and splint
One major topic that wasn't covered in Evan's answer is integer overflows. Here are some examples:
wchar_t *towcs(const char *s)
{
size_t l = strlen(s)+1;
mbstate_t mbs = {0};
wchar_t *w = malloc(l*sizeof *w), *w2;
if (!w || (l=mbsrtowcs(w, (char **)&s, l, &st))==-1) {
free(w);
return 0;
}
return (w2=realloc(w, l*sizeof *w)) ? w2 : w;
}
Here, a giant string (>1gig on 32-bit) will make multiplication by the size (I'm assuming 4) overflow, resulting in a tiny allocation and subsequent writes past the end of it.
Another more common example:
uint32_t cnt;
fread(&cnt, 1, 4, f);
cnt=ntohl(cnt);
struct record *buf = malloc(cnt * sizeof *buf);
This sort of code turns up in reading file/network data quite a lot, and it's subject to the same sort of overflows.
Basically, any arithmetic performed on values obtained from an untrusted source, which will eventually be used as an allocation size/array offset, needs to be checked. You can either do it the cheap way (impose arbitrary limits on the value read that keep it significantly outside the range which could overflow, or you can test for overflow at each step: Instead of:
foo = malloc((x+1)*sizeof *foo);
You need to do:
if (x<=SIZE_MAX-1 && x+1<=SIZE_MAX/sizeof *foo) foo = malloc((x+1)*sizeof *foo);
else goto error;
A simple grep for malloc/realloc with arithmetic operators in its argument will find many such errors (but not ones where the overflow already occurred a few lines above, etc.).
Here's a book recommendation: Writing Secure Code. Demonstrates not only how to write secure code, but also common pitfalls and practices that expose security holes. It's slightly dated (my copy says it was published in 2002), but the security concepts it teaches are still quite applicable even 8 years later.
Some source code constructs you can keep an eye out for are:
Functions that don't do bounds checking. Evan covered it pretty well.
Input validation & sanitization, or lack thereof.
NULL pointer dereferencing
fork()s, execve()s, pipe()s, system() called with non-static parameters (or worse, with user input).
Objects shared between threads with inappropriate storage durations (pointers to automatic variables or even "dead" objects in thread-local storage).
When dealing with file manipulation, make sure correct variable types are used for the return results of functions. Make sure they're checked for errors. Make no assumptions about the implementation - permissions of created files, uniqueness of filenames, etc.
Poor sources of randomness (for encryption, communication, etc.) should be avoided.
Simple or obvious mistakes (perhaps out of carelessness) should be fixed anyway. You never know what's exploitable, unless it is.
Also, are the data protected? Well, if you don't care, that's fine. :-)
Some tools that you can consider are:
valgrind : exposes memory flaws, which in large applications are usually critical.
splint : a static checker
fuzzing frameworks
RATS : a free, open-source tool. Its authors' company was acquired by Fortify.
I took a security class where we used a commercial product called Fortify 360, which did static analysis of C++ code. We ran it against an old-old-old version of OpenSSL, and it found loads of stuff, and provided guidance to rectify the flaws (which, by the way, the latest version of OpenSSL had resolved).
At any rate, it is a useful commercial tool.
Some of the OpenBSD folk just recently published a presentation on their coding practices.