Related
I was wondering if it is good practice to use the member operator . like this:
someVector = (segment.getFirst() - segment.getSecond()).normalize().normalCCW();
Just made that to show the two different things I was wondering, namely if using (expressions).member/function() and foo.getBar().getmoreBar() were in keeping with the spirit of readability and maintainability. In all the c++ code and books I learned from, I've never seen it used in this way, yet its intoxicatingly easy to use it as such. Don't want to develop any bad habits though.
Probably more (or less) important than that, I was also wondering if there would be any performance gains/losses by using it in this fashion, or unforeseen pitfalls that would introduce bugs into the program.
Thank you in advance!
or unforeseen pitfalls that would introduce bugs into the program
Well, the possible pitfalls would be
Harder to debug. You won't be able to view the results of each function call, so if one of them is returning something unexpected you will need to break it up into smaller segments to see what is going on. Also, any call in the chain may fail completely, so again, you may have to break it up to find out which call is failing.
Harder to read (sometimes). Chaining function calls can make the code harder to read. It depends on the situation, there's no hard and fast rule here. If the expression is even somewhat complex it can make things hard to follow. I don't have any problem reading your specific example.
It ultimately comes down to personal preference. I don't strive to fit as much as possible on a single line, and I have been bitten enough times by chaining where I shouldn't that I tend to break things up a bit. However, for simple expressions which are not likely to fail, chaining is fine.
Yes, this is perfectly acceptable and in fact would be completely unreadable in a lot of contexts if you were to NOT do this.
It's called method chaining.
There MIGHT be some performance gain in that you're not creating temporary variables. But any competent compiler will optimise it anyway.
it is perfectly valid to use it the way you showed. It is used in the named parameter idiom described in C++ faq lite for example.
One reason it is not always used is when you have to store intermediate result for performance reasons (if normalize is costly and you have to use it more than one time, it is better to store the result in a variable) or readability.
my2c
Using a variable to hold intermediate results can sometimes enhance readability, especially if you use good variable names. Excessive chaining can make it hard to understand what is happening. You have to use your judgement to decide if it's worthwhile to break down chains using variables. The example you present above is not excessive to me. Performance shouldn't differ much one way or the other if you enable optimization.
someVector = (segment.getFirst() - segment.getSecond()).normalize().normalCCW();
Not an answer to your question, but I should tell you that
the behavior of the expression (segment.getFirst() - segment.getSecond()) is not well-defined as per the C++ Standard. The order in which each operand is evaluated is unspecified by the Standard!
Also, see this related topic : Is this code well-defined?
I suppose what you are doing is less readable, however on the other hand, too many temporary variables can also become unreadable.
As far performance I'm sure there is a little overhead when making temporary variables but the compiler could optimize that out.
There's no big problem with using it in this way- some APIs benefit greatly from method chaining. Plus, it's misleading to create a variable, and then only use it once. When someone reads the next line, they don't have to think about all those variables that you now didn't keep.
It depends of what you're doing.
For readability you should try to use intermediate variables.
Assign calculation results to pointers, and then use them.
I face a situation where we have many very long methods, 1000 lines or more.
To give you some more detail, we have a list of incoming high level commands, and each generates results in a longer (sometime huge) list of lower level commands. There's a factory creating an instance of a class for each incoming command. Each class has a process method, where all the lower level commands are generated added in sequence. As I said, these sequences of commands and their parameters cause quite often the process methods to reach thousands of lines.
There are a lot of repetitions. Many command patterns are shared between different commands, but the code is repeated over and over. That leads me to think refactoring would be a very good idea.
On the contrary, the specs we have come exactly in the same form as the current code. Very long list of commands for each incoming one. When I've tried some refactoring, I've started to feel uncomfortable with the specs. I miss the obvious analogy between the specs and code, and lose time digging into newly created common classes.
Then here the question: in general, do you think such very long methods would always need refactoring, or in a similar case it would be acceptable?
(unfortunately refactoring the specs is not an option)
edit:
I have removed every reference to "generate" cause it was actually confusing. It's not auto generated code.
class InCmd001 {
OutMsg process ( InMsg& inMsg ) {
OutMsg outMsg = OutMsg::Create();
OutCmd001 outCmd001 = OutCmd001::Create();
outCmd001.SetA( param.getA() );
outCmd001.SetB( inMsg.getB() );
outMsg.addCmd( outCmd001 );
OutCmd016 outCmd016 = OutCmd016::Create();
outCmd016.SetF( param.getF() );
outMsg.addCmd( outCmd016 );
OutCmd007 outCmd007 = OutCmd007::Create();
outCmd007.SetR( inMsg.getR() );
outMsg.addCmd( outCmd007 );
// ......
return outMsg;
}
}
here the example of one incoming command class (manually written in pseudo c++)
Code never needs refactoring. The code either works, or it doesn't. And if it works, the code doesn't need anything.
The need for refactoring comes from you, the programmer. The person reading, writing, maintaining and extending the code.
If you have trouble understanding the code, it needs to be refactored. If you would be more productive by cleaning up and refactoring the code, it needs to be refactored.
In general, I'd say it's a good idea for your own sake to refactor 1000+ line functions. But you're not doing it because the code needs it. You're doing it because that makes it easier for you to understand the code, test its correctness, and add new functionality.
On the other hand, if the code is automatically generated by another tool, you'll never need to read it or edit it. So what'd be the point in refactoring it?
I understand exactly where you're coming from, and can see exactly why you've structured your code the way it is, but it needs to change.
The uncertainty you feel when you attempt to refactor can be ameliorated by writing unit tests. If you've tests specific to each spec, then the code for each spec can be refactored until you're blue in the face, and you can have confidence in it.
A second option, is it possible to automatically generate your code from a data structure?
If you've a core suite of classes that do the donkey work and edge cases, you can auto-generate the repetitive 1000 line methods as often as you wish.
However, there are exceptions to every rule.
If the methods are a literal interpretation of the spec (very little additional logic), and the specs change infrequently, and the "common" portions (i.e. bits that happen to be the same right now) of the specs change at different times, and you're not going to be asked to get a 10x performance gain out of the code anytime soon, then (and only then) . . . you may be better off with what you have.
. . . but on the whole, refactor.
Yes, always. 1000 lines is at least 10x longer than any function should ever be, and I'm tempted to say 100x, except that when dealing with input parsing and validation it can become natural to write functions with 20 or so lines.
Edit: Just re-read your question and I'm not clear on one point - are you talking about machine generated code that no-one has to touch? In which case I would leave things as they are.
Refectoring is not the same as writing from scratch. While you should never write code like this, before you refactor it, you need to consider the costs of refactoring in terms of time spent, the associated risks in terms of breaking code that already works, and the net benefits in terms of future time saved. Refactor only if the net benefits outweigh the associated costs and risks.
Sometimes wrapping and rewriting can be a safer and more cost effective solution, even if it appears expensive at first glance.
Long methods need refactoring if they are maintained (and thus need to be understood) by humans.
As a rule of thumb, code for humans first. I don't agree with the common idea that functions need to be short. I think what you need to aim at is when a human reads your code they grok it quickly.
To this effect it's a good idea to simplify things as much as possible--but not more than that. It's a good idea to delegate roughly one task for each function. There is no rule as for what "roughly one task" means: you'll have to use your own judgement for that. But do recognize that a function split into too many other functions itself reduces readability. Think about the human being who reads your function for the first time: they would have to follow one function call after another, constantly context-switching and maintaining a stack in their mind. This is a task for machines, not for humans.
Find the balance.
Here, you see how important naming things is. You will see it is not that easy to choose names for variables and functions, it takes time, but on the other hand it can save a lot of confusion on the human reader's side. Again, find the balance between saving your time and the time of the friendly humans who will follow you.
As for repetition, it's a bad idea. It's something that needs to be fixed, just like a memory leak. It's a ticking bomb.
As others have said before me, changing code can be expensive. You need to do the thinking as for whether it will pay off to spend all this time and effort, facing the risks of change, for a better code. You will possibly lose lots of time and make yourself one headache after another now, in order to possibly save lots of time and headache later.
Take a look at the related question How many lines of code is too many?. There are quite a few tidbits of wisdom throughout the answers there.
To repost a quote (although I'll attempt to comment on it a little more here)... A while back, I read this passage from Ovid's journal:
I recently wrote some code for
Class::Sniff which would detect "long
methods" and report them as a code
smell. I even wrote a blog post about
how I did this (quelle surprise, eh?).
That's when Ben Tilly asked an
embarrassingly obvious question: how
do I know that long methods are a code
smell?
I threw out the usual justifications,
but he wouldn't let up. He wanted
information and he cited the excellent
book Code Complete as a
counter-argument. I got down my copy
of this book and started reading "How
Long Should A Routine Be" (page 175,
second edition). The author, Steve
McConnell, argues that routines should
not be longer than 200 lines. Holy
crud! That's waaaaaay to long. If a
routine is longer than about 20 or 30
lines, I reckon it's time to break it
up.
Regrettably, McConnell has the cheek
to cite six separate studies, all of
which found that longer routines were
not only not correlated with a greater
defect rate, but were also often
cheaper to develop and easier to
comprehend. As a result, the latest
version of Class::Sniff on github now
documents that longer routines may not
be a code smell after all. Ben was
right. I was wrong.
(The rest of the post, on TDD, is worth reading as well.)
Coming from the "shorter methods are better" camp, this gave me a lot to think about.
Previously my large methods were generally limited to "I need inlining here, and the compiler is being uncooperative", or "for one reason or another the giant switch block really does run faster than the dispatch table", or "this stuff is only called exactly in sequence and I really really don't want function call overhead here". All relatively rare cases.
In your situation, though, I'd have a large bias toward not touching things: refactoring carries some inherent risk, and it may currently outweigh the reward. (Disclaimer: I'm slightly paranoid; I'm usually the guy who ends up fixing the crashes.)
Consider spending your efforts on tests, asserts, or documentation that can strengthen the existing code and tilt the risk/reward scale before any attempt to refactor: invariant checks, bound function analysis, and pre/postcondition tests; any other useful concepts from DBC; maybe even a parallel implementation in another language (maybe something message oriented like Erlang would give you a better perspective, given your code sample) or even some sort of formal logical representation of the spec you're trying to follow if you have some time to burn.
Any of these kinds of efforts generally have a few results, even if you don't get to refactor the code: you learn something, you increase your (and your organization's) understanding of and ability to use the code and specifications, you might find a few holes that really do need to be filled now, and you become more confident in your ability to make a change with less chance of disastrous consequences.
As you gain a better understanding of the problem domain, you may find that there are different ways to refactor you hadn't thought of previously.
This isn't to say "thou shalt have a full-coverage test suite, and DBC asserts, and a formal logical spec". It's just that you are in a typically imperfect situation, and diversifying a bit -- looking for novel ways to approach the problems you find (maintainability? fuzzy spec? ease of learning the system?) -- may give you a small bit of forward progress and some increased confidence, after which you can take larger steps.
So think less from the "too many lines is a problem" perspective and more from the "this might be a code smell, what problems is it going to cause for us, and is there anything easy and/or rewarding we can do about it?"
Leaving it cooking on the backburner for a bit -- coming back and revisiting it as time and coincidence allows (e.g. "I'm working near the code today, maybe I'll wander over and see if I can't document the assumptions a bit better...") may produce good results. Then again, getting royally ticked off and deciding something must be done about the situation is also effective.
Have I managed to be wishy-washy enough here? My point, I think, is that the code smells, the patterns/antipatterns, the best practices, etc -- they're there to serve you. Experiment to get used to them, and then take what makes sense for your current situation, and leave the rest.
I think you first need to "refactor" the specs. If there are repetitions in the spec it also will become easier to read, if it makes use of some "basic building blocks".
Edit: As long as you cannot refactor the specs, I wouldn't change the code.
Coding style guides are all made for easier code maintenance, but in your special case the ease of maintenance is achieved by following the spec.
Some people here asked if the code is generated. In my opinion it does not matter: If the code follows the spec "line by line" it makes no difference if the code is generated or hand-written.
1000 thousand lines of code is nothing. We have functions that are 6 to 12 thousand lines long. Of course those functions are so big, that literally things get lost in there, and no tool can help us even look at high level abstractions of them. the code is now unfortunately incomprehensible.
My opinion of functions that are that big, is that they were not written by brilliant programmers but by incompetent hacks who shouldn't be left anywhere near a computer - but should be fired and left flipping burgers at McDonald's. Such code wreaks havok by leaving behind features that cannot be added to or improved upon. (too bad for the customer). The code is so brittle that it cannot be modified by anyone - even the original authors.
And yes, those methods should be refactored, or thrown away.
Do you ever have to read or maintain the generated code?
If yes, then I'd think some refactoring might be in order.
If no, then the higher-level language is really the language you're working with -- the C++ is just an intermediate representation on the way to the compiler -- and refactoring might not be necessary.
Looks to me that you've implemented a separate language within your application - have you considered going that way?
It has been my understanding that it's recommended that any method over 100 lines of code be refactored.
I think some rules may be a little different in his era when code is most commonly viewed in an IDE. If the code does not contain exploitable repetition, such that there are 1,000 lines which are going to be referenced once each, and which share a significant number of variables in a clear fashion, dividing the code into 100-line routines each of which is called once may not be that much of an improvement over having a well-formatted 1,000-line module which includes #region tags or the equivalent to allow outline-style viewing.
My philosophy is that certain layouts of code generally imply certain things. To my mind, when a piece of code is placed into its own routine, that suggests that the code will be usable in more than one context (exception: callback handlers and the like in languages which don't support anonymous methods). If code segment #1 leaves an object in an obscure state which is only usable by code segment #2, and code segment #2 is only usable on a data object which is left in the state created by #1, then absent some compelling reason to put the segments in different routines, they should appear in the same routine. If a program puts objects through a chain of obscure states extending for many hundreds of lines of code, it might be good to rework the design of the code to subdivide the operation into smaller pieces which have more "natural" pre- and post- conditions, but absent some compelling reason to do so, I would not favor splitting up the code without changing the design.
For further reading, I highly recommend the long, insightful, entertaining, and sometimes bitter discussion of this topic over on the Portland Pattern Repository.
I've seen cases where it is not the case (for example, creating an Excel spreadsheet in .Net often requires a lot of line of code for the formating of the sheet), but most of the time, the best thing would be to indeed refactor it.
I personally try to make a function small enough so it all appears on my screen (without affecting the readability of course).
1000 lines? Definitely they need to be refactored. Also not that, for example, default maximum number of executable statements is 30 in Checkstyle, well-known coding standard checker.
If you refactor, when you refactor, add some comments to explain what the heck it's doing.
If it had comments, it would be much less likely a candidate for refactoring, because it would already be easier to read and follow for someone starting from scratch.
Then here the question: in general, do
you think such very long methods would
always need refactoring,
if you ask in general, we will say Yes .
or in a
similar case it would be acceptable?
(unfortunately refactoring the specs
is not an option)
Sometimes are acceptable, but is very unusual, I will give you a pair of examples:
There are some 8 bit microcontrollers called Microchip PIC, that have only a fixed 8 level stack, so you can't nest more than 8 calls, then care must be taken to avoid "stack overflow", so in this special case having many small function (nested) is not the best way to go.
Other example is when doing optimization of code (at very low level) so you have to take account the jump and context saving cost. Use it with care.
EDIT:
Even in generated code, you could need to refactorize the way its generated, for example for memory saving, energy saving, generate human readable, beauty, who knows, etc..
There has been very good general advise, here a practical recommendation for your sample:
common patterns can be isolated in plain feeder methods:
void AddSimpleTransform(OutMsg & msg, InMsg const & inMsg,
int rotateBy, int foldBy, int gonkBy = 0)
{
// create & add up to three messages
}
You might even improve that by making this a member of OutMsg, and using a fluent interface, such that you can write
OutMsg msg;
msg.AddSimpleTransform(inMsg, 12, 17)
.Staple("print")
.AddArtificialRust(0.02);
which can be an additional improvement under circumstances.
I don't mean external tools. I think of architectural patterns, language constructs, habits. I am mostly interested in C++
Automated Unit Testing .
There's an oft-unappreciated technique that I like to call The QA Team that can do wonders for weeding out bugs before they reach production.
It's been my experience (and is often quoted in textbooks) that programmers don't make the best testers, despite what they may think, because they tend to test to behaviour they already know to be true from their coding. On top of that, they're often not very good at putting themelves in the shoes of the end user (if it's that kind of app), and so are likely to neglect UI formatting/alignment/usability issues.
Yes, unit testing is immensely important and I'm sure others can give you better tips than I on that, but don't neglect your system/integration testing. :)
..and hey, it's a language independent technique!
Code Review, Unit Testing, and Continuous Integration may all help.
I find the following rather handy.
1) ASSERTs.
2) A debug logger that can output to the debug spew, console or file.
3) Memory tracking tools.
4) Unit testing.
5) Smart pointers.
Im sure there are tonnes of others but I can't think of them off the top of my head :)
RAII to avoid resource leakage errors.
Strive for simplicity and conciseness.
Never leave cases where your code behavior is undefined.
Look for opportunities to leverage the type system and have the compiler check as much as possible at compile time. Templates and code generation are your friends as long as you keep your common sense.
Minimize the number of singletons and global variables.
Use RAII !
Use assertions !
Automatic testing of some nominal and all corner cases.
Avoid last minute changes like the plague.
I use thinking.
Reducing variables scope to as narrow as possible. Less variables in outer scope - less chances to plant and hide an error.
I found that, the more is done and checked at compile time, the less can possibly go wrong at run-time. So I try to leverage techniques that allow stricter checking at compile-time. That's one of the reason I went into template-meta programming. If you do something wrong, it doesn't compile and thus never leaves your desk (and thus never arrives at the customer's).
I find many problems before i start testing at all using
asserts
Testing it with actual, realistic data from the start. And testing is necessary not only while writing the code, but it should start early in the design phase. Find out what your worst use cases will be like, and make sure your design can handle it. If your design feels good and elegant even against these use cases, it might actually be good.
Automated tests are great for making sure the code you write is correct. However, before you get to writing code, you have to make sure you're building the right things.
Learning functional programming helps somehow.
HERE
Learn you a haskell for great good.
Model-View-Controller, and in general anything with contracts and interfaces that can be unit-tested automatically.
I agree with many of the other answers here.
Specific to C++, the use of 'const' and avoiding raw pointers (in favor of references and smart pointers) when possible has helped me find errors at compile time.
Also, having a "no warnings" policy helps find errors.
Requirements.
From my experience, having full and complete requirements is the number one step in creating bug-free software. You can't write complete and correct software if you don't know what it's supposed to do. You can't write proper tests for software if you don't know what it's supposed to do; you'll miss a fair amount of stuff you should test. Also, the simple process of writing the requirements helps you to flesh them out. You find so many issues and problems before you ever write the first line of code.
I find peer progamming tends to help avoid a lot of the silly mistakes, and al ot of the time generates discussions which uncover flaws. Plus with someone free to think about the why you are doing something, it tends to make everything cleaner.
Code reviews; I've personally found lots of bugs in my colleagues' code and they have found bugs in mine.
Code reviews, early and often, will help you to both understand each others' code (which helps for maintenance), and spot bugs.
The sooner you spot a bug the easier it is to fix. So do them as soon as you can.
Of course pair programming takes this to an extreme.
Using an IDE like IntelliJ that inspects my code as I write it and flags dodgy code as I write it.
Unit Testing followed by Continious Integration.
Book suggestions: "Code Complete" and "Release it" are two must-read books on this topic.
In addition to the already mentioned things I believe that some features introduced with C++0x will help avoiding certain bugs. Features like strongly-typed enums, for-in loops and deleteing standard functions of objects come to mind.
In general strong typing is the way to go imho
Coding style consistency across a project.
Not just spaces vs. tab issues, but the way that code is used. There is always more than one way to do things. When the same thing gets done differently in different places, it makes catching common errors more difficult.
It's already been mentioned here, but I'll say it again because I believe this cannot be said enough:
Unnecessary complexity is the arch nemesis of good engineering.
Keep it simple. If things start looking complicated, stop and ask yourself why and what you can do to break the problem down into smaller, simpler chunks.
Hire someone that test/validate your software.
We have a guy that use our software before any of our customer. He finds bugs that our automated tests processes do not find, because he thinks as a customer not as a software developper. This guy also gives support to our customers, because he knows very well the software from the customer point of view. INVALUABLE.
all kinds of 'trace'.
Something not mentioned yet - when there's even semi-complex logic going on, name your variables and functions as accurately as you can (but not too long). This will make incongruencies in their interactions with each other, and with what they're supposed to be doing stand out better. The 'meaning', or language-parsing part of your brain will have more to grab on to. I find that with vaguely named things, your brain sort of glosses over what's really there and sees what is /supposed to/ be happening rather than what actually is.
Also, make code clean, it helps to keep your brain from getting fuzzy.
Test-driven development combined with pair programming seems to work quite well on keeping some bugs down. Getting the tests created early helps work out some of the design as well as giving some confidence should someone else have to work with the code.
Creating a string representation of class state, and printing those out to console.
Note that in some cases single line-string won't be enough, you will have to code small printing loop, that would create multi-line representation of class state.
Once you have "visualized" your program in such a way you can start to search errors in it. When you know which variable contained wrong value in the end, it's easy to place asserts everywhere where this variable is assigned or modified. This way you can pin point the exact place of error, and fix it without using the step-by-step debugging (which is rather slow way to find bugs imo).
Just yesterday found a really nasty bug without debugging a single line:
vector<string> vec;
vec.push_back("test1");
vec.push_back(vec[0]); // second element is not "test1" after this, it's empty string
I just kept placing assert-statements and restarting the program, until multi-line representation of program's state was correct.
One of my developers has started using RegexBuddy for help in interpreting legacy code, which is a usage I fully understand and support. What concerns me is using a regex tool for writing new code. I have actually discouraged its use for new code in my team. Two quotes come to mind:
Some people, when confronted with a
problem, think "I know, I’ll use
regular expressions." Now they have
two problems. - Jamie Zawinski
And:
Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as
cleverly as possible, you are, by
definition, not smart enough to debug
it. - Brian Kernighan
My concerns are (respectively:)
That the tool may make it possible to solve a problem using a complicated regular expression that really doesn't need it. (See also this question).
That my one developer, using regex tools, will start writing regular expressions which (even with comments) can't be maintained by anyone who doesn't have (and know how to use) regex tools.
Should I encourage or discourage the use of regex tools, specifically with regard to producing new code? Are my concerns justified? Or am I being paranoid?
Poor programming is rarely the fault of the tool. It is the fault of the developer not understanding the tool. To me, this is like saying a carpenter should not own a screwdriver because he might use a screw where a nail would have been more appropriate.
Regular expressions are just one of the many tools available to you. I don't generally agree with the oft-cited Zawinski quote, as with any technology or technique, there are both good and bad ways to apply them.
Personally, I see things like RegexBuddy and the free Regex Coach primarily as learning tools. There are certainly times when they can be helpful to debug or understand existing regexes, but generally speaking, if you've written your regex using a tool, then it's going to be very hard to maintain it.
As a Perl programmer, I'm very familiar with both good and bad regular expressions, and have been using even complicated ones in production code successfully for many years. Here are a few of the guidelines I like to stick to that have been gathered from various places:
Don't use a regex when a string match will do. I often see code where people use regular expressions in order to match a string case-insensitively. Simply lower- or upper-case the string and perform a standard string comparison.
Don't use a regex to see if a string is one of several possible values. This is unnecessarily hard to maintain. Instead place the possible values in an array, hash (whatever your language provides) and test the string against those.
Write tests! Having a set of tests that specifically target your regular expression makes development significantly easier, particularly if it's a vaguely complicated one. Plus, a few tests can often answer many of the questions a maintenance programmer is likely to have about your regex.
Construct your regex out of smaller parts. If you really need a big complicated regex, build it out of smaller, testable sections. This not only makes development easier (as you can get each smaller section right individually), but it also makes the code more readable, flexible and allows for thorough commenting.
Build your regular expression into a dedicated subroutine/function/method. This makes it very easy to write tests for the regex (and only the regex). it also makes the code in which your regex is used easier to read (a nicely named function call is considerably less scary than a block of random punctuation!). Dropping huge regular expressions into the middle of a block of code (where they can't easily be tested in isolation) is extremely common, and usually very easy to avoid.
You should encourage the use of tools that make your developers more efficient. Having said that, it is important to make sure they're using the right tool for the job. You'll need to educate all of your team members on when it is appropriate to use a regular expression, and when (less|more) powerful methods are called for. Finally, any regular expression (IMHO) should be thoroughly commented to ensure that the next generation of developers can maintain it.
I'm not sure why there is so much diffidence against regex.
Yes, they can become messy and obscure, exactly as any other piece of code somebody may write but they have an advantage over code: they represent the set of strings one is interested to in a formally specified way (at least by your language if there are extensions). Understanding which set of strings is accepted by a piece of code will require "reverse engineering" the code.
Sure, you could discurage the use of regex as has already been done with recursion and goto's but this would be justifed to me only if there's a good alternative.
I would prefer maintain a single line regex code than a convoluted hand-made functions that tries to capture a set of strings.
On using a tool to understand a regex (or write a new one) I think it's perfectly fine! If somebody wrote it with the tool, somebody else could understand it with a tool! Actually, if you are worried about this, I would see tools like RegexBuddy your best insurance that the code will not be unmaintainable just because of the regex's
Regex testing tools are invaluable. I use them all the time. My job isn't even particularly regex heavy, so having a program to guide me through the nuances as I build my knowledge base is crucial.
Regular expressions are a great tool for a lot of text handling problems. If you have someone on your team who is writing regexes that the rest of the team don't understand, why not get them to teach the rest of you how they are working? Rather than a threat, you could be seeing this as an opportunity. That way you wouldn't have to feel threatened by the unknown and you'll have another very valuable tool in your arsenal.
Zawinski's comments, though entertainingly glib, are fundamentally a display of ignorance and writing Regular Expressions is not the whole of coding so I wouldn't worry about those quotes. Nobody ever got the whole of an argument into a one-liner anyways.
If you came across a Regular Expression that was too complicated to understand even with comments, then probably a regex wasn't a good solution for that particular problem, but that doesn't mean they have no use. I'd be willing to bet that if you've deliberately avoided them, there will be places in your codebase where you have many lines of code and a single, simple, Regex would have done the same job.
Regexbuddy is a useful shortcut, to make sure that the regular expressions you are writing do what you expect- it certainly makes life easier, but it's the matter of using them at all that is what seems important to me about your question.
Like others have said, I think using or not using such a tool is a neutral issue. More to the point: If a regular expression is so complicated that it needs inline comments, it is too complicated. I never comment my regexps. I approach large or complex matching problems by breaking it down into several steps of matching, either with multiple match statements (=~), or by building up a regexp with sub regexps.
Having said all that, I think any developer worth his salt should be reasonably proficient in regular expression writing and reading. I've been using regular expressions for years and have never encountered a time where I needed to write or read one that was terrifically complex. But a moderately sized one may be the most elegant and concise way to do a validation or match, and regexps should not be shied away from only because an inexperienced developer may not be able to read it -- better to educate that developer.
What you should be doing is getting your other devs hooked up with RB.
Don't worry about that whole "2 probs" quote; it seems that may have been a blast on Perl (said back in 1997) not regex.
I prefer not to use regex tools. If I can't write it by hand, then it means the output of the tool is something I don't understand and thus can't maintain. I'd much rather spend the time reading up on some regex feature than learning the regex tool. I don't understand the attitude of many programmers that regexes are a black art to be avoided/insulated from. It's just another programming language to be learned.
It's entirely possible that a regex tool would save me some time implementing regex features that I do know, but I doubt it... I can type pretty fast, and if you understand the syntax well (using a text editor where regexes are idiomatic really helps -- I use gVim), most regexes really aren't that complex. I think you're nearly always better served by learning a technology better rather than learning a crutch, unless the tool is something where you can put in simple info and get out a lot of boilerplate code.
Well, it sounds like the cure for that is for some smart person to introduce a regex tool that annotates itself as it matches. That would suggest that using a tool is not as much the issue as whether there is a big gap between what the tool understands and what the programmer understands.
So, documentation can help.
This is a real trivial example is a table like the following (just a suggestion)
Expression Match Reason
^ Pos 0 Start of input
\s+ " " At least one space
(abs|floor|ceil) ceil One of "abs", "floor", or "ceil"
...
I see the issue, though. You probably want to discourage people from building more complex regular expression than they can parse. I think standards can address this, by always requiring expanded REs and check that the annotation is proper.
However, if they just want to debug an RE, to make sure it's acting as they think it's acting, then it's not really much different from writing code you have to debug.
It's relative.
A couple of regex tools (for Node/JS, PHP and Python) i made (for some other projects) are available online to play and experiment.
regex-analyzer and regex-composer
github repo
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
What guidelines do you follow to improve the general quality of your code? Many people have rules about how to write C++ code that (supposedly) make it harder to make mistakes. I've seen people insist that every if statement is followed by a brace block ({...}).
I'm interested in what guidelines other people follow, and the reasons behind them. I'm also interested in guidelines that you think are rubbish, but are commonly held. Can anyone suggest a few?
To get the ball rolling, I'll mention a few to start with:
Always use braces after every if / else statement (mentioned above). The rationale behind this is that it's not always easy to tell if a single statement is actually one statement, or a preprocessor macro that expands to more than one statement, so this code would break:
// top of file:
#define statement doSomething(); doSomethingElse
// in implementation:
if (somecondition)
doSomething();
but if you use braces then it will work as expected.
Use preprocessor macros for conditional compilation ONLY. preprocessor macros can cause all sorts of hell, since they don't allow C++ scoping rules. I've run aground many times due to preprocessor macros with common names in header files. If you're not careful you can cause all sorts of havoc!
Now over to you.
A few of my personal favorites:
Strive to write code that is const correct. You will enlist the compiler to help weed out easy to fix but sometimes painful bugs. Your code will also tell a story of what you had in mind at the time you wrote it -- valuable for newcomers or maintainers once you're gone.
Get out of the memory management business. Learn to use smart pointers: std::auto_ptr, std::tr1::shared_ptr (or boost::shared_ptr) and boost::scoped_ptr. Learn the differences between them and when to use one vs. another.
You're probably going to be using the Standard Template Library. Read the Josuttis book. Don't just stop after the first few chapters on containers thinking that you know the STL. Push through to the good stuff: algorithms and function objects.
Delete unnecessary code.
That is all.
Use and enforce a common coding style and guidelines. Rationale: Every developer on the team or in the firm is able to read the code without distractions that may occur due to different brace styles or similar.
Regularly do a full rebuild of your entire source base (i.e. do daily builds or builds after each checkin) and report any errors! Rationale: The source is almost always in a usable state, and problems are detected shortly after they are "implemented", where problem solving is cheap.
Turn on all the warnings you can stand in your compiler (gcc: -Wall is a good start but doesn't include everything so check the docs), and make them errors so you have to fix them (gcc: -Werror).
Google's style guide, mentioned in one of these answers, is pretty solid. There's some pointless stuff in it, but it's more good than bad.
Sutter and Alexandrescu wrote a decent book on this subject, called C++ Coding Standards.
Here's some general tips from lil' ole me:
Your indentation and bracketing style are both wrong. So are everyone else's. So follow the project's standards for this. Swallow your pride and setup your editor so that everything is as consistent as possible with the rest of the codebase. It's really really annoying having to read code that's indented inconsistently. That said, bracketing and indenting have nothing whatsoever to do with "improving your code." It's more about improving your ability to work with others.
Comment well. This is extremely subjective, but in general it's always good to write comments about why code works the way it does, rather than explaining what it does. Of course for complex code it's also good for programmers who may not be familiar with the algorithm or code to have an idea of what it's doing as well. Links to descriptions of the algorithms employed are very welcome.
Express logic in as straightforward a manner as possible. Ironically suggestions like "put constants on the left side of comparisons" have gone wrong here, I think. They're very popular, but for English speakers, they often break the logical flow of the program to those reading. If you can't trust yourself (or your compiler) to write equality compares correctly, then by all means use tricks like this. But you're sacrificing clarity when you do it. Also falling under this category are things like ... "Does my logic have 3 levels of indentation? Could it be simpler?" and rolling similar code into functions. Maybe even splitting up functions. It takes experience to write code that elegantly expresses the underlying logic, but it's worth working at it.
Those were pretty general. For specific tips I can't do a much better job than Sutter and Alexandrescu.
In if statements put the constant on the left i.e.
if( 12 == var )
not
if( var == 12 )
Beacause if you miss typing a '=' then it becomes assignment. In the top version the compiler says this isn't possible, in the latter it runs and the if is always true.
I use braces for if's whenever they are not on the same line.
if( a == b ) something();
if( b == d )
{
bigLongStringOfStuffThatWontFitOnASingleLineNeatly();
}
Open and close braces always get their own lines. But that is of course personal convention.
Only comment when it's only necessary to explain what the code is doing, where reading the code couldn't tell you the same.
Don't comment out code that you aren't using any more. If you want to recover old code, use your source control system. Commenting out code just makes things look messy, and makes your comments that actually are important fade into the background mess of commented code.
Use consistent formatting.
When working on legacy code employ the existing style of formatting, esp. brace style.
Get a copy of Scott Meyer's book Effective C++
Get a copy of Steve MConnell's book Code Complete.
There is also a nice C++ Style Guide used internally by Google, which includes most of the rules mentioned here.
Start to write a lot of comments -- but use that as an opportunity to refactor the code so that it's self explanatory.
ie:
for(int i=0; i<=arr.length; i++) {
arr[i].conf() //confirm that every username doesn't contain invalid characters
}
Should've been something more like
for(int i=0; i<=activeusers.length; i++) {
activeusers[i].UsernameStripInvalidChars()
}
Use tabs for indentations, but align data with spaces
This means people can decide how much to indent by changing the tab size, but also that things stay aligned (eg you might want all the '=' in a vertical line when assign values to a struct)
Allways use constants or inline functions instead of macros where posible
Never use 'using' in header files, because everything that includes that heafer will also be affected, even if the person includeing your header doesn't want all of std (for example) in their global namespace.
If something is longer than about 80 columes, break it up into multiple lines eg
if(SomeVeryLongVaribleName != LongFunction(AnotherVarible, AString) &&
BigVaribleIsValid(SomeVeryLongVaribleName))
{
DoSomething();
}
Only overload operators to make them do what the user expects, eg overloading the + and - operators for a 2dVector is fine
Always comment your code, even if its just to say what the next block is doing (eg "delete all textures that are not needed for this level"). Someone may need to work with it later, posibly after you have left and they don't want to find 1000's of lines of code with no comments to indicate whats doing what.
setup coding convention and make everyone involved follow the convention (you wouldn't want reading code that require you to figure out where is the next statement/expression because it is not indented properly)
constantly refactoring your code (get a copy of Refactoring, by Martin Fowler, pros and cons are detailed in the book)
write loosely coupled code (avoid writing comment by writing self-explanatory code, loosely coupled code tends to be easier to manage/adapt to change)
if possible, unit test your code (or if you are macho enough, TDD.)
release early, release often
avoid premature optimization (profiling helps in optimizing)
In a similar vein you might find some useful suggestions here: How do you make wrong code look wrong? What patterns do you use to avoid semantic errors?
Where you can, use pre-increment instead of post-increment.
I use PC-Lint on my C++ projects and especially like how it references existing publications such as the MISRA guidelines or Scott Meyers' "Effective C++" and "More Effective C++". Even if you are planning on writing very detailed justifications for each rule your static analysis tool checks, it is a good idea to point to established publications that your user trusts.
Here is the most important piece of advice I was given by a C++ guru, and it helped me in a few critical occasions to find bugs in my code:
Use const methods when a method is not supposed to modify the object.
Use const references and pointers in parameters when the object is not supposed to modify the object.
With these 2 rules, the compiler will tell you for free where in your code the logic is flawed!
Also, for some good techniques you might follow Google's blog "Testing on the Toilet".
Look at it six months later
make sure you indent properly
Hmm - I probably should have been a bit more specific.
I'm not so much looking for advice for myself - I'm writing a static code analysis tool (the current commercial offerings just aren't good enough for what I want), and I'm looking for ideas for plugins to highlight possible errors in the code.
Several people have mentioned things like const correctness and using smart pointers - that's the kind of think I can check for. Checking for indentation and commenting is a bit harder to do (from a programming point of view anyway).
Smart pointers have a nice way of indicating ownership very clearly. If you're a class or a function:
if you get a raw pointer, you don't own anything. You're allowed to use the pointee, courtesy of your caller, who guarantees that the pointee will stay alive longer than you.
if you get a weak_ptr, you don't own the pointee, and on top of that the pointee can disappear at any time.
if you get a shared_ptr, you own the object along with others, so you don't need to worry. Less stress, but also less control.
if you get an auto_ptr, you are the sole owner of the object. It's yours, you're the king. You have the power to destroy that object, or give it to someone else (thereby losing ownership).
I find the case for auto_ptr particularly strong: in a design, if I see an auto_ptr, I immediately know that that object is going to "wander" from one part of the system to the other.
This is at least the logic I use on my pet project. I'm not sure how many variations there can be on the topic, but until now this ruleset has served me well.