Related
I would like to do a mass replacement in a Powerbuilder project. (For the story, I'd like to replace most ".object." by SetItem/GetItem equivalents, for diverse reasons)
While it won't be covering necessarily all cases, I would like to know if there is a way to apply a regex find/replace over an entire workspace, or at least over selected pbl files.
I have seen this other question, but I'm wondering if there is a more simple way than exporting everything (especially that reimporting doesn't seem like the funniest thing to do).
The best option is to export and make a small application indicating rules make the change in the exported file, then import.
Nope. Especially since the getitem methods vary by datatype. There are a variety of global replace type utilities (like PibblePeeper) which could help but you would have to do many passes through the code.
Is there a tool to run code convention tests in clojure? For example, make sure function names don't have any capital letters or keywords don't have any underscores in them.
Two useful Leiningen plugins I learned about recently:
lein-bikeshed
lein-kibit
Late to the party here. Seconding noahlz, the three main static analysis tools that I use on a regular basis are lein-bikeshed, lein-kibit, and Eastwood, though I also use yagni. Each of these has different strengths.
Bikeshed is good for general code cleanup but is mostly focused on style (e.g. making sure lines aren't too long, there's no trailing whitespace, functions have docstrings, etc.).
Kibit is good for showing you the most idiomatic function to use (for instance, when using an if form that returns nil if false, you could just use when instead).
Eastwood is probably the most comprehensive lint tool that exists for Clojure, and checks for a pretty impressive number of code smell issues.
Finally, Yagni is great for finding unused code paths in your libraries and applications.
I have to perform refactoring of a medium size code block (< 200K LOC). The scope is pretty moderate: rename some classes, move a few nested definitions up and down the class hierarchy, remove unused stuff.
It would be pretty straightforward to do it by hand but we will have to pick up bug fixes from the older code base for one or two years, and the project will change at least half of lines in the existing code.
So, I am planning to express the changes as a sequence of indent (supposedly astyle), sed script, and another indent.
My plans are: do conversion by hand, then develop the sed script that will yeld the same result. The former part is pretty clear, but developing bit sed script by hand does not seem particularly appealing but I do not have any better idea.
Please, help.
Have a look at the large scale static analysis and refactoring tools that mozilla devs were working on
https://wiki.mozilla.org/Static_Analysis
I'm not sure what has happened since the release of gcc 4.5 - possibly pork and oink are easier to set up now.
sed can probably be cozened into doing it, but for multiline blocks you're better off with something easier to work with. Even awk would be an improvement, but I'd be looking at Perl/Python/scripting language of choice. Preferably with a parser, which would also save you the initial indent run.
In fact, I'd look for a parser that generated an annotated syntax tree, which makes refactoring largely a matter of moving tree branches around.
One of my developers has started using RegexBuddy for help in interpreting legacy code, which is a usage I fully understand and support. What concerns me is using a regex tool for writing new code. I have actually discouraged its use for new code in my team. Two quotes come to mind:
Some people, when confronted with a
problem, think "I know, I’ll use
regular expressions." Now they have
two problems. - Jamie Zawinski
And:
Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as
cleverly as possible, you are, by
definition, not smart enough to debug
it. - Brian Kernighan
My concerns are (respectively:)
That the tool may make it possible to solve a problem using a complicated regular expression that really doesn't need it. (See also this question).
That my one developer, using regex tools, will start writing regular expressions which (even with comments) can't be maintained by anyone who doesn't have (and know how to use) regex tools.
Should I encourage or discourage the use of regex tools, specifically with regard to producing new code? Are my concerns justified? Or am I being paranoid?
Poor programming is rarely the fault of the tool. It is the fault of the developer not understanding the tool. To me, this is like saying a carpenter should not own a screwdriver because he might use a screw where a nail would have been more appropriate.
Regular expressions are just one of the many tools available to you. I don't generally agree with the oft-cited Zawinski quote, as with any technology or technique, there are both good and bad ways to apply them.
Personally, I see things like RegexBuddy and the free Regex Coach primarily as learning tools. There are certainly times when they can be helpful to debug or understand existing regexes, but generally speaking, if you've written your regex using a tool, then it's going to be very hard to maintain it.
As a Perl programmer, I'm very familiar with both good and bad regular expressions, and have been using even complicated ones in production code successfully for many years. Here are a few of the guidelines I like to stick to that have been gathered from various places:
Don't use a regex when a string match will do. I often see code where people use regular expressions in order to match a string case-insensitively. Simply lower- or upper-case the string and perform a standard string comparison.
Don't use a regex to see if a string is one of several possible values. This is unnecessarily hard to maintain. Instead place the possible values in an array, hash (whatever your language provides) and test the string against those.
Write tests! Having a set of tests that specifically target your regular expression makes development significantly easier, particularly if it's a vaguely complicated one. Plus, a few tests can often answer many of the questions a maintenance programmer is likely to have about your regex.
Construct your regex out of smaller parts. If you really need a big complicated regex, build it out of smaller, testable sections. This not only makes development easier (as you can get each smaller section right individually), but it also makes the code more readable, flexible and allows for thorough commenting.
Build your regular expression into a dedicated subroutine/function/method. This makes it very easy to write tests for the regex (and only the regex). it also makes the code in which your regex is used easier to read (a nicely named function call is considerably less scary than a block of random punctuation!). Dropping huge regular expressions into the middle of a block of code (where they can't easily be tested in isolation) is extremely common, and usually very easy to avoid.
You should encourage the use of tools that make your developers more efficient. Having said that, it is important to make sure they're using the right tool for the job. You'll need to educate all of your team members on when it is appropriate to use a regular expression, and when (less|more) powerful methods are called for. Finally, any regular expression (IMHO) should be thoroughly commented to ensure that the next generation of developers can maintain it.
I'm not sure why there is so much diffidence against regex.
Yes, they can become messy and obscure, exactly as any other piece of code somebody may write but they have an advantage over code: they represent the set of strings one is interested to in a formally specified way (at least by your language if there are extensions). Understanding which set of strings is accepted by a piece of code will require "reverse engineering" the code.
Sure, you could discurage the use of regex as has already been done with recursion and goto's but this would be justifed to me only if there's a good alternative.
I would prefer maintain a single line regex code than a convoluted hand-made functions that tries to capture a set of strings.
On using a tool to understand a regex (or write a new one) I think it's perfectly fine! If somebody wrote it with the tool, somebody else could understand it with a tool! Actually, if you are worried about this, I would see tools like RegexBuddy your best insurance that the code will not be unmaintainable just because of the regex's
Regex testing tools are invaluable. I use them all the time. My job isn't even particularly regex heavy, so having a program to guide me through the nuances as I build my knowledge base is crucial.
Regular expressions are a great tool for a lot of text handling problems. If you have someone on your team who is writing regexes that the rest of the team don't understand, why not get them to teach the rest of you how they are working? Rather than a threat, you could be seeing this as an opportunity. That way you wouldn't have to feel threatened by the unknown and you'll have another very valuable tool in your arsenal.
Zawinski's comments, though entertainingly glib, are fundamentally a display of ignorance and writing Regular Expressions is not the whole of coding so I wouldn't worry about those quotes. Nobody ever got the whole of an argument into a one-liner anyways.
If you came across a Regular Expression that was too complicated to understand even with comments, then probably a regex wasn't a good solution for that particular problem, but that doesn't mean they have no use. I'd be willing to bet that if you've deliberately avoided them, there will be places in your codebase where you have many lines of code and a single, simple, Regex would have done the same job.
Regexbuddy is a useful shortcut, to make sure that the regular expressions you are writing do what you expect- it certainly makes life easier, but it's the matter of using them at all that is what seems important to me about your question.
Like others have said, I think using or not using such a tool is a neutral issue. More to the point: If a regular expression is so complicated that it needs inline comments, it is too complicated. I never comment my regexps. I approach large or complex matching problems by breaking it down into several steps of matching, either with multiple match statements (=~), or by building up a regexp with sub regexps.
Having said all that, I think any developer worth his salt should be reasonably proficient in regular expression writing and reading. I've been using regular expressions for years and have never encountered a time where I needed to write or read one that was terrifically complex. But a moderately sized one may be the most elegant and concise way to do a validation or match, and regexps should not be shied away from only because an inexperienced developer may not be able to read it -- better to educate that developer.
What you should be doing is getting your other devs hooked up with RB.
Don't worry about that whole "2 probs" quote; it seems that may have been a blast on Perl (said back in 1997) not regex.
I prefer not to use regex tools. If I can't write it by hand, then it means the output of the tool is something I don't understand and thus can't maintain. I'd much rather spend the time reading up on some regex feature than learning the regex tool. I don't understand the attitude of many programmers that regexes are a black art to be avoided/insulated from. It's just another programming language to be learned.
It's entirely possible that a regex tool would save me some time implementing regex features that I do know, but I doubt it... I can type pretty fast, and if you understand the syntax well (using a text editor where regexes are idiomatic really helps -- I use gVim), most regexes really aren't that complex. I think you're nearly always better served by learning a technology better rather than learning a crutch, unless the tool is something where you can put in simple info and get out a lot of boilerplate code.
Well, it sounds like the cure for that is for some smart person to introduce a regex tool that annotates itself as it matches. That would suggest that using a tool is not as much the issue as whether there is a big gap between what the tool understands and what the programmer understands.
So, documentation can help.
This is a real trivial example is a table like the following (just a suggestion)
Expression Match Reason
^ Pos 0 Start of input
\s+ " " At least one space
(abs|floor|ceil) ceil One of "abs", "floor", or "ceil"
...
I see the issue, though. You probably want to discourage people from building more complex regular expression than they can parse. I think standards can address this, by always requiring expanded REs and check that the annotation is proper.
However, if they just want to debug an RE, to make sure it's acting as they think it's acting, then it's not really much different from writing code you have to debug.
It's relative.
A couple of regex tools (for Node/JS, PHP and Python) i made (for some other projects) are available online to play and experiment.
regex-analyzer and regex-composer
github repo
Let’s say that you decide to change the name of Stack Overflow to Frack Overflow.
Now, in your code you already have dozens of objects and variables and selectors with some variation of the name "Stack". You want them to now be replaced with "Frack".
So my question is, would you opt to run your entire codebase through a regular expression filter and change all of these names? Or would you let them be?
I would use the "rename" feature of a good IDE to do it for me.
It depends, really.
In a language like C++, you can get away with this because the compiler will let you know right away if something would break. However, other less-picky languages will allow you to refer to variables which don't exist, and the worst that happens is a slap on the wrist in the form of an exception being thrown for a null reference.
I was working on a flex project once where the codebase was a real mess, and we decided to go through the code and beautify it a bit to meet the Adobe AS3 coding standards. Since I was new to the project, I didn't realize that the variable names in some classes actually referred to persistent objects which hibernate (running the java webapp for the backend server) was using to create mappings. So renaming these variables caused the entire flex frontend to misbehave, even when we did it with the "correct" refactoring tools in our IDE.
But really, I'd say to check your OCD at the door and make your changes a little at a time. Any time you change dozens of files in a large project, you risk destabilizing it, and in this case, the benefit derived from such a risk doesn't pay off.
I'd first ask myself the question why? It is a risk/reward judgement at the end of the day which only you can make.
I would be very reluctant to do it for stylistic reasons, but for class re-factoring it may be legitimate.
Well, not necessarily a disaster, but it certainly can cause some trouble on large code bases. That's why I hate hungarian notation: it makes you change all of your variable names if you happen to change its type.
If there are objects, members and fields in your solution with names that reference a certain customer implementation, I would work hard to re-factor these to use more generic names instead, and I would let Resharper do the re-naming, not some generic text-search-and-replace tool.
Just use a refactoring tool like Resharper by JetBrains or CodeRush and Refactor! by DevExpress. They change all references of a variable in your entire codebase automatically and can do much more.
I believe Refactor! is even included in the VB version of Visual Studio. I use Resharper and I refuse to develop without it.
If I were using a source code version control system (like svn, git, bazar, mercurial etc) I would not be afraid to refactor my code.
Use some kind of "find replace all" or refactoring of some IDE, compile (if it is not a dynamic language) and run your tests (if any).
If something goes horribly wrong, you can always revert your code using the source control system.
Renaming is perhaps the most common refactoring. It is rather encouraged to refactor your code as you go, as this gives you the flexibility of not having to make permanent decisions about names, code placement, etc. as you are first writing your application. If you are not familiar with the idea, I would suggest you start with the Wikipedia page and then dive into Martin Fowler's site.
However, if you need to write your own regex to rename things, then imho you could use some better tools. Don't waste your time reinventing the wheel -- and then fixing whatever your new wheel broke by accident. If you have the option, use an existing tool (IDE or whatever) to do the dirty work.
Even if you have "dozens" of things to rename, I think you're better off finding them one by one manually, and then using an automatic Rename to fix all instances throughout your code.
You need good justification for doing it, I think. When I make changes that have a large number of potential side effects across a large codebase, which happens from time to time, I usually look for a way to make the compiler fail on spots I've missed. And, if possible, I tend to do it in stages so as to minimize the break.
I wouldn't rename just for the sake of renaming, though.