get started with regular expression [closed] - regex

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
I’m always afraid whenever I see any regular expression. I think it very difficult to understand. But fear is not the solution. I’ve decided to start learning regex, so can someone advise me how I can just start? And if there’s is any easy tutorial?

☝ Getting Started with /Regexes/
Regular expressions are a form of declarative programming. If you are used to imperative, functional, or object-oriented programming, then they are a very different way of thinking. It’s a rules-based approach with subtle backtracking issues. I daresay a background in Prolog might actually do you some good with these, which certainly isn’t something I commonly advise.
Normally I would just have people play around with the grep command from their shell, then advance to using regexes for searching and replacing in their editor.
But I’m guessing you aren’t coming from a Unix background, because if you were, you would have come across regexes all over, from the very most basic grep command to pattern-matching in the vi or emacs editors. You can look at the grep manpage by typing
% man grep
on your BSD,
Linux,
Apple, or Sun systems — just to name a few.
                                     ☹ ¡ʇɟoƨoɹɔᴉƜ ʇnoqɐ əɯ ʞƨɐ ʇ ̦uop əƨɐəld ʇƨnɾ  ☹
☟ (?: Book Learnin’? )
If you ran into regular expresions at school or university, it was probably in the context of automata theory. They come up when discussing regular languages. If you have suffered through such classes, you may remember that regular expressions are the user-friendly face to messy finite automata. What they probably did not teach you, however, is that outside of the ivory tower, the regular expressions people actually use to in the real world are far, far behind "regular" in the rarefied, theoretical, and highly irregular sense of that otherwise commonplace word. This means that the modern regular expressions — call them patterns if you prefer — can do much more than the traditional regular
expressions taught in computer science classes. There just isn’t any REGULAR left in modern regular expressions outside the classroom, but this is a good thing.
I say “modern”, but in fact regular expressions haven’t been regular since Ken Thompson first put back references into his backtracking NFA, back when he was famousluy proving NFA–DFA equivalence. So unless you actually are using a DFA engine, it might be best to just forget any book-learnin’ nonsense about REGULARness of regexes. It just doesn’t apply to the way we really use them every day in the real world.
Modern regular expressions allow for much more than just back references though, as you will find once you delve into them. They’re their own wonderful world, even if that world is a bit surreal at times. They can let you substitute for pages and pages of code in just one line. They can also make you lose hair over their crazy behavior. Sometimes they make your computer seem like it’s hung, because it’s actually working very hard in a race between it and the heat-death of the universe in some awful O(2ⁿ) algorithm, or even worse. It can easily be much worse, actually. That’s what having this sort of power in your hands can do. There are no training wheel or slow lane. Regexes are a power tool par excellence.
/☕✷⅋⋙$⚣™‹ª∞¶⌘̤℈⁑‽#♬˘$π❧/
⁠
⁠
⁠
Just one more thing before I give you a big list of helpful references. As I’ve already said today elsewhere, regexes do not have to be ugly, and they do not have to be hard. REMEMBER: If you create ugly regexes, it is only a reflection on you, not on them.
That’s absolutely no excuse for creating regexes that are hard to read. Oh, there’s plenty like that out there all right, but they shouldn’t be and they needn’t be. Even though regexes are (for the most part( a form of declarative programming, all the software engineering techniques that one uses in other forms of programming   ̲s̲t̲i̲l̲l̲ ̲a̲p̲p̲l̲y̲ ̲h̲e̲r̲e̲!
A regex should never look like a dense row of punctuation that’s impossible to decipher. Any language would be a disaster if you removed all the alphabetical identifiers, removed all whitespace and indentation, removed all comments, and removed every last trace of top-down programming. So of course they look like cr#p if you do that. Don’t do that!
So use all of those basic tools, including aesthetically pleasing code layout, careful problem decomposition, named subroutines, decoupling the declaration from the execution (including ordering!), unit testing, plus all the rest, whenever you’re creating regexes. These are all critical steps in making your patterns maintainable.
It’s one thing to write /(.)\1/, but quite another to write something like mǁ☕⅋⚣⁑™∞¶⌘℈‽#♬❧ǁ. Those are regexes from the Dark Ages: don’t just reject them: burn them at the stake! It’s programming, after all, not line-noise or golf!
☞ Regex References
The Wikipedia page on regular expressions is a decent enough overview.
IBM has a nice introduction to regexes in their Speaking Unix series.
Russ Cox has a very nice list of classic regular expressions references. You might want to check out the original Version 8 regular expressions, here found in a Perl manpage, but these were the original, most basic patterns that everybody grew up with back in olden days.
Mastering Regular Expressions from O’Reilly, by Jeffrey Friedl.
Jan Goyvaerts’s regular-expressions.info site and his Regular Expression Cookbook, also from O’Reilly.
I’m a native speaker of Perl, so let me say four words about it. Chapter 5 of the Perl Cookbook and Chapter 6 of Programming Perl, both somewhat embarrassingly by yours truly et alios, also from O’Reilly, are devoted to regular expressions in Perl. Perl was the language that originated most regex features found in modern regular expressions, and it continues to lead the pack. Perl’s Unicode support for regexes is especially rich and remarkably simple to use — in comparison with other languages’. You can download all the code examples from those two books from the O’Reilly site, or see the next item. The perldoc.org site has quite a bit on pattern matching, including the perlre and perluniprops manpages, just to take a couple of starting points.
Apropos the Perl Cookbook, the PLEAC project has reïmplemented the Perl Cookbook code in a dizzying number of diverse languages, including ada, common lisp, groovy, guile, haskell, java, merd, ocaml, php, pike, python, rexx, ruby, and tcl. If you look at what each language does for their equivalent of PCB’s regex chapter, you will learn a tremendously huge amount about how that language deals with regular expressions. It’s a marvellous resource and quite an eye-opener, even if some up the solutions are, um, supoptimal.
Java Regular Expressions by Mehran Habibi from Apress. It’s certainly better than trying to figure anything out by reading Sun’s documentation on the Pattern class. Java is probably the worst possible language for learning regexes in; it is very clumsy and often completely stupid. I speak from painful personal experience, not from ignorance, and I am hardly alone in this appraisal. If you have to use a JVM language, I recommend Groovy or perhaps Scala. Unfortunately, both are based on the standard Java pattern matching classes, so share their inadequacies.
If you need Unicode and you’re using Java or C⁺⁺ instead of Perl, then I recommend looking into the ICU library. They handle Unicode in Java much better than Sun does, but it still feels too much like assembler for my tastes. Perl and Java appear to have the best support for Unicode and multiple encodings. Java is still kinda warty, but other languages often have this even worse. Be warned that languages with regexes bolted on the site are always clumsier to use them in than those that don’t.
If you’re using C, then I would probably skip over the system-supplied regex library and jump right into PCRE by Phil Hazel. A bonus is that PCRE can be built to handle Unicode reasonably well. It is also the basic regex library used by several other languages and tools, including PHP.

regular-expressions.info is a gold-mine of information and tutorials about regular expressions. From beginner to expert, there's not much out there that is better than this site when it comes to the study of regular expressions.

regular-expressions.info has a good tutorial here

http://www.regular-expressions.info/tutorial.html

Regular expressions in itself might not achieve any utility, unless combined in with either a text manipulation operations using some kind of scripting tool(sed/awk) or a programming language like Perl or so. Try to install Regex Buddy. Nice standalone tool which can let you use regular expressions, on some files you may point it to.
So yes you can learn about some basic info mentioning their structure, syntax, semantics, if I may call so, but try to read the regular expressions tutorials in - Perl, Vim,... and do some example string/text manipulation in those contexts, programatically
-AD.

While learning at: regular-expressions.info, the Regular Expressions Cheat Sheet (V2) is something you definitely want to have.

http://www.gskinner.com/RegExr/ exists both as an online version and as an AIR application.
The cool thing about this app (besides that it work like a charm) is that you can save your expressions or share them with the community right from the app.
Say you need an e-mail regex you can just search for e-mail and you will get back a rated list of expressions.
Another helpful feature is the interpretation of your expressions into human readable form. This makes it easier to learn and master.
For the tutorial part this article is very easy consume.

This book saved my ass when I was starting out with awk and sed.

Related

When a string is being matched against a regular expression, what's going on behind the scenes?

I'd be interested to know what kind of algorithms are used for matching it, and how they are optimised, because I imagine that somes regexes could produce a vast number of possible matches that could cause serious problems on a poorly witten regex parser.
Also, I recently discovered the concept of a ReDoS, why do regexes such as (a|aa)+ or (a|a?)+ cause problems?
EDIT: I have used them most in C# and Python, so that's what was in my mind when I was considering the question. I assume Python's is written in C like the rest of the interpreter, but I have no idea about C#
I find http://www.regular-expressions.info has really useful info about regular expressions.
The author specifically talks about catastrophic uses of regular expression.
Regex Buddy has this debug page which "offers you a unique view inside a regular expression engine".
http://www.regexbuddy.com/debug.html
There are two kinds of regular expression engine: NFA and DFA. I am quite rusty so I don't dare go into specifics by memory. Here is a page that goes through the algorithms, though. Some parsers will perform better with poorly-written expressions. A good book on the subject (that is sitting on my shelf) is Mastering Regular Expression.

Regular Expression library in C/C++

I want to write regular expression library in C/C++.
What is the good starting point , any books or articles.
I know there are may libraries are available , but I want to write my own version.
A good starting point is to use existing implementations and criticize them.
Pay attention to data structures and design decisions you don't like.
Avoid them when you write your version.
[Edit 16-Jan-2015] I recently encountered this beautiful book Beautiful Code. I recommend you go through Chapter 1, "A Regular Expression Matcher" by Brian Kernighan.
You can read the classic paper by Ken Thompson, "Regular expression search algorithm" ... http://portal.acm.org/citation.cfm?doid=363347.363387 ... this paper should give you a good understanding on how regular expressions are matched using finite automata.
This is another page giving some detailed information by Russ Cox ... http://swtch.com/~rsc/regexp/
Hope these help you get started.
I don't know a book that will help you with the implementation details -- and I'm sure there are tons of details to make it efficient. However, the book Languages and Machines, by Thomas A. Sudkamp, will be of help to understand the ideas behind an implementation.
I think what you'll need to do is compile a regular expression into a finite automata. If you don't know much about grammars and automatas, then part II of that book "Grammars, Automata, and Languages" will be of great help.
The book Compilers, principles, techniques, & tools; by Alfred Aho, Monica Lam, Ravi Sethi and Jeffrey Ullman (also refered to as the dragon book), may also be of help. It's oriented towards making a compiler for a computer language, not for regular expression language. However, you'll probably find it helpful, specially the part about parsing, as it has more of a practical nature (as opposed to Languages and Machines that is very theoretical).
Anyway, if I was to write a regular expression language, those would be my starting points. I recommend you borrowing both from the library you have access to. Other than that, you should take a look at working implementations. I'm just guessing here, but I think there'll be probably good documentation regarding Perl regular expression implementation. Seeing they're so popular and work so well.
Good luck.

Why isn't there a regular expression standard?

I know there is the perl regex that is sort of a minor de facto standard, but why hasn't anyone come up with a universal set of standard symbols, syntax and behaviors?
There is a standard by IEEE associated with the POSIX effort. The real question is "why doesn't everyone follow it"? The answer is probably that it is not quite as complex as PCRE (Perl Compatible Regular Expression) with respect to greedy matching and what not.
Actually, there is a regular expression standard (POSIX), but it's crappy. So people extend their RE engine to fit the needs of their application. PCRE (Perl-compatible regular expressions) is a pseudo-standard for regular expressions that are compatible with Perl's RE engine. This is particularly relevant because you can embed Perl's engine into other applications.
Because making standards is hard. It's nearly impossible to get enough people to agree on anything to make it an official standard, let alone something as complex as regex. Defacto standards are much easier to come by.
Case in point: HTML 5 is not expected to become an official standard until the year 2022. But the draft specification is already available, and major features of the standard will begin appearing in browsers long before the standard is official.
I have researched this and could not find anything concrete. My guess is that it's because regex is so often a tool that works ON tools and therefore it's going to necessarily have platform- and tool- specific extensions.
For example, in Visual Studio, you can use regular expressions to find and replace strings in your source code. They've added stuff like :i to match an identifier. On other platforms in other tools, identifiers may not be an applicable concept. In fact, perhaps other platforms and tools reserve the colon character to escape the expression.
Differences like that make this one particularly hard to standardize.
Perl was first (or danm near close to first), and while it's perl and we all love it, it's old some people felt it needed more polish (i.e. features). This is where new types came in.
They're starting to nomalize, the regex used in .NET is very similar to the regex used in other languages, i think slowly people are starting to unify, but some are used to thier perl ways and dont want to change.
Just a guess: there was never a version popular enough to be considered the canonical standard, and there was no standard implementation. Everyone who came and reimplemented it had their own ideas on how to make it "better".
Because too many people are scared of regular expressions, so they haven't become fully widespread enough for enough sensible people to both think of the idea and be in a position to implement it.
Even if a standards body did form and try to unify the different flavours, too many people would argue stubbornly towards their own approach, whether better or not, because lots of programmers are annoying like that.

How important is knowing Regexs?

My personal experience is that regexs solve problems that can't be efficiently solved any other way, and are so frequently required in a world where strings are as important as they are that not having a firm grasp of the subject would be sufficient reason for me to consider not hiring you as a senior programmer (a junior is always allowed the leeway of training).
However.
A number of responses on the recurrent "What's the regex for this?" type-questions suggest that a great deal of coders find them somewhere between unintelligible and opaque.
This is not about whether a simple indexOf or substring is a better solution, that's a technical matter, and sometimes the simple way is correct, sometimes a regex is, and sometimes neither (looking at you html parser questions).
This is about how important it is to understand Regexs and whether the anti-Regex opinion (that trite "...now they have two problems" thing) is merited or FUD.
Should a programmer should be expected to understand Regexs? Is this a required skill?
edit: just in case it isn't clear, I'm not asking whether I need to learn them (I'm a defender of the faith) but whether the anti-camp have are an evolutionary dead end or whether it's an unnecessary niche skill like InstallShield.
REs let you solve relatively complex problems that would otherwise require you to code up full parsers with backtracking and all that messy sort of stuff. I liken the use of REs to using chainsaws to chop down a tree instead of trying to do it with a piece of celery.
Once you've learned how to use the chainsaw safely, you'll never go back. People who continue to spout anti-RE propaganda will never be as productive as those of us who have learned to love them.
So yes, you should know how to use REs, even if you understand only the basic constructs. They're a tool just like any other.
There are some tasks where regular expressions are the best tool to use.
There are some tasks where regular expressions are pointlessly obscure.
There are some tasks where they're reasonably appropriate, but a different approach may be more readable.
In general, I think of using a regular expression when an actual pattern is involved. If you're just looking for a specific string, I wouldn't generally use a regex. As an example of a grey area, someone once asked on a newsgroup the best way to check whether one string contained any of a number of other strings. The two ways which came up were:
Build a regex with alternatives and perform a single match.
Test each string in turn with string.Contains.
Personally I think the latter way is much simpler - it doesn't require any thought about escaping the strings you're looking for, or any other knowledge of regular expressions (and their different flavours across different platforms).
As an example of somewhere that regular expressions are quite clearly the wrong choice, someone seriously proposed using a regular expression to test whether or not a string three characters long. Their regular expression didn't even work, despite them claiming that the reason they thought of regular expressions first is because they'd been using them for so long, and that they naturally sort of "thought" in regular expressions.
There are, however, plenty of examples where regular expressions really do make life easier - as I say, when you're actually matching patterns: "I want one letter, then three digits, then another letter" or whatever. I don't find myself using regular expressions very often, but when I do use them, they save a lot of work.
In short, I believe it's good to know regular expressions - but equally to be careful about when to use them. It's easy to end up with write-only code which could be made simpler to understand by rewriting with simple string operations, even if the resulting code is slightly longer.
EDIT: In response to the edit of the question...
I don't think it's a good idea to be evangelical about them - in my experience, that tends to lead to using them where an alternative would be simpler, and that just makes you look bad. On the other hand, if you come across someone writing complicated code to avoid using a regular expression, it's fine to point out that a regex would make the code simpler.
Personally I like to comment my regular expressions in quite a detailed way, splitting them up onto several lines with a comment between each line. That way they're easier to maintain, and it doesn't look like you're just trying to be "hard core" geeky (which can be the impression, even if it's not the actual intended aim).
I think the most important thing is to remember that short != readable. Never claim that using a regex is better because it requires less code - claim that it's better when it's genuinely simpler and easier to understand (or where there's a significant performance benefit, of course).
As a developer you should know the pros and cons of as many tools as possible that could provide pre-made solutions for your problems. Every developer should know how to work with regular expressions and have a feeling when they should be used and when it is besser to use simple string functions to achieve a goal.
Rejecting them outright because they are hard to read is no option in my opinion. A developer who thinks so strips himself of a valuable tool for searching and validating complex string patterns.
I have really mixed feelings. I have used them and know the bones of the syntax and something in me loves their conciseness. However they are not commonly understood and are a highly obfuscated form of code. I too would like to see performance comparisons against similar operations in plain code. There is no question that the exploded code will be more maintainable and more easily and widely understood, which is a serious consideration in any commercial software project.
Even if they turn out to be more performant, the argument for them taken to its logical conclusion would see us all embedding assembler into our code for important loops - perhaps we should. Neat and concise and very fast, but almost un-maintainable.
On balance I think that until the regex syntax becomes mainstream they probably cause more trouble than they solve and should be used only very carefully.
In the Steve Yegge's article, Five Essential Phone Screen Questions, you should read the section "Area Number Three: Scripting and Regular Expressions".
Steve Yegge has some interesting points. He gives real world problems he has encountered with clients having to parse 50,000 files for a particular pattern of a phone number. The applicants who know regular expressions tear through the problem in a few minutes while those who don't write monster multi-hundred line programs that are very unwieldy. This article convinced me I should learn regular expressions.
Not a brilliant answer but everywhere I've worked the following holds true
0 < Number of people who (fully) understand regex < 1
If I knew how to do it I'd write that previous expression as a regex, but I can't. The best I could come up with on the fly is s/fully/a little/g - that's my limit (and that's probably not a regex).
A more serious answer is that the right regex will solve all kinds of problems, with one(ish) line of code. But you'll have real problems debugging it if it goes wrong. Therefore IMHO a complex regex however 'clean/clever' is a liability, if it takes ten lines of code to replicate it, why's that a problem, is memory/disk space suddenly expensive again?
BTW I'd love to know if regexs are fast compared to code equivalent.
It is not clear what kind of answer you are expecting.
I can imagine roughly three kinds of answer to this question:
Regexen are essential to the education of professional programmers. They enable the use the powerful unix shell tools, and regex-based search-replace can dramatically cut down on text-munging handiwork that is a part of a programmer's life. Programmers that do not know regexen are just intelectually lazy which is a very bad trait for a programmer.
Regexps are kinda useful depending on the application domain. Surely, knowing how to write regexps is a valuable tool a programmer's chest, but most of the time you can do fine without using them. Also, regexps tend to be very hard to read, so abuse must be strongly discouraged.
Some nutcases like to put regexs everything (I'm looking at you, the perl guy who implemented a regex-based tetris in perl). But really, they are just a bit of computer science trivia whose only practical use is in writing parsers. They are widely taught because they make a good teaching topic on which to evaluate students, and like most such topics it can forgotten the second you step out of the exam room.
You will notice the careful use of the plural forms "regexen" (pro), "regexps" (careful neutral) and "regexs" (con).
Personally, I am of the first kind. Good programmers like to learn new languages, and they hate repetitive handiwork.
When you have to parse something (ranging from simple date strings to programming languages) you should know your tools and regular expressions are one of them.
But you should also know what you can do with regexes and what not. At this point it comes in handy if you know the Chomsky hierarchy
hierarchy. Otherwise you end up trying to use regular expressions to parse context-sensitive languages and wonder why you can't get your regex right.
The fact that all languages support regexs should mean something !
I think knowing a regex is a quite important skill. While the usage of regex in a programming environment/language is question of maintainable code, I find the knowledge of regex to be useful with some commands (say egrep), editors (vim, emacs etc.). Using a regex to do a find and replace in vim is very handy when you have a text file and you want to do some formatting once in a while.
I find it very useful to know regular expressions. They are a very powerful tool, and in my opinion there are problems that you simply can't solve without these.
I would however not take regular expressions as a killing criterion for "hiring you as a senior programmer". They are like the wealth of other tools in the world. You should really known them in a problem domain where you need them, but you cannot presume that someone already knows all of these.
"a junior is always allowed the leeway
of training"
If a senior isn't, then I would not hire him!
To the ones that argue how complex and unreadable a regular expression is: If the regexp solution to a problem is complex and unreadable, then probably the problem itself is! Good luck in solving it in an other way...
What does the following do?
"([A-Za-z][A-Za-z0-9+.-]{1,120}:A-Za-z0-9/{1,333}(#([a-zA-Z0-9][a-zA-Z0-9$_.+!*,;/?:#&~=%-]{0,1000}))?)"
How long did it take you to figure out? to debug?
Regexs are awesome for single-use throwaway programs, but long hairy regexps are not the best choice for programs that other people will need to maintain over the years.
I find that regex's can be very helpful depending on the type of programming that you do. However I probably write less than one regex a month, and because of this long interval between requiring regex's I forget alot about how they work.
I should probably go through mastering regular expressions or something similar someday.
Knowing when to use a regexp and the basics of how they work and what their limitations are is important. But filling your head with a lot of syntax rules that you probably won't need very often is just a pointless academic exercise.
A regexp crib sheet can be written on one sheet of A4 paper or a couple of pages in a textbook - no need to know this stuff by heart, If you use it every day it will stick. If you don't use it very often then the brain cells are probably better used for something else.
A developer thought he had one problem and tried to solve it using regex. Now he has 2 problems.
I agree with pretty much everything said here, and just need to include the mandatory quip:
Some people, when confronted with a
problem, think "I know, I'll use
regular expressions." Now they have
two problems.
(attributed to Jamie Zawinski)
Like most jokes, it contains a kernel of truth.

Are regex tools (like RegexBuddy) a good idea?

One of my developers has started using RegexBuddy for help in interpreting legacy code, which is a usage I fully understand and support. What concerns me is using a regex tool for writing new code. I have actually discouraged its use for new code in my team. Two quotes come to mind:
Some people, when confronted with a
problem, think "I know, I’ll use
regular expressions." Now they have
two problems. - Jamie Zawinski
And:
Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as
cleverly as possible, you are, by
definition, not smart enough to debug
it. - Brian Kernighan
My concerns are (respectively:)
That the tool may make it possible to solve a problem using a complicated regular expression that really doesn't need it. (See also this question).
That my one developer, using regex tools, will start writing regular expressions which (even with comments) can't be maintained by anyone who doesn't have (and know how to use) regex tools.
Should I encourage or discourage the use of regex tools, specifically with regard to producing new code? Are my concerns justified? Or am I being paranoid?
Poor programming is rarely the fault of the tool. It is the fault of the developer not understanding the tool. To me, this is like saying a carpenter should not own a screwdriver because he might use a screw where a nail would have been more appropriate.
Regular expressions are just one of the many tools available to you. I don't generally agree with the oft-cited Zawinski quote, as with any technology or technique, there are both good and bad ways to apply them.
Personally, I see things like RegexBuddy and the free Regex Coach primarily as learning tools. There are certainly times when they can be helpful to debug or understand existing regexes, but generally speaking, if you've written your regex using a tool, then it's going to be very hard to maintain it.
As a Perl programmer, I'm very familiar with both good and bad regular expressions, and have been using even complicated ones in production code successfully for many years. Here are a few of the guidelines I like to stick to that have been gathered from various places:
Don't use a regex when a string match will do. I often see code where people use regular expressions in order to match a string case-insensitively. Simply lower- or upper-case the string and perform a standard string comparison.
Don't use a regex to see if a string is one of several possible values. This is unnecessarily hard to maintain. Instead place the possible values in an array, hash (whatever your language provides) and test the string against those.
Write tests! Having a set of tests that specifically target your regular expression makes development significantly easier, particularly if it's a vaguely complicated one. Plus, a few tests can often answer many of the questions a maintenance programmer is likely to have about your regex.
Construct your regex out of smaller parts. If you really need a big complicated regex, build it out of smaller, testable sections. This not only makes development easier (as you can get each smaller section right individually), but it also makes the code more readable, flexible and allows for thorough commenting.
Build your regular expression into a dedicated subroutine/function/method. This makes it very easy to write tests for the regex (and only the regex). it also makes the code in which your regex is used easier to read (a nicely named function call is considerably less scary than a block of random punctuation!). Dropping huge regular expressions into the middle of a block of code (where they can't easily be tested in isolation) is extremely common, and usually very easy to avoid.
You should encourage the use of tools that make your developers more efficient. Having said that, it is important to make sure they're using the right tool for the job. You'll need to educate all of your team members on when it is appropriate to use a regular expression, and when (less|more) powerful methods are called for. Finally, any regular expression (IMHO) should be thoroughly commented to ensure that the next generation of developers can maintain it.
I'm not sure why there is so much diffidence against regex.
Yes, they can become messy and obscure, exactly as any other piece of code somebody may write but they have an advantage over code: they represent the set of strings one is interested to in a formally specified way (at least by your language if there are extensions). Understanding which set of strings is accepted by a piece of code will require "reverse engineering" the code.
Sure, you could discurage the use of regex as has already been done with recursion and goto's but this would be justifed to me only if there's a good alternative.
I would prefer maintain a single line regex code than a convoluted hand-made functions that tries to capture a set of strings.
On using a tool to understand a regex (or write a new one) I think it's perfectly fine! If somebody wrote it with the tool, somebody else could understand it with a tool! Actually, if you are worried about this, I would see tools like RegexBuddy your best insurance that the code will not be unmaintainable just because of the regex's
Regex testing tools are invaluable. I use them all the time. My job isn't even particularly regex heavy, so having a program to guide me through the nuances as I build my knowledge base is crucial.
Regular expressions are a great tool for a lot of text handling problems. If you have someone on your team who is writing regexes that the rest of the team don't understand, why not get them to teach the rest of you how they are working? Rather than a threat, you could be seeing this as an opportunity. That way you wouldn't have to feel threatened by the unknown and you'll have another very valuable tool in your arsenal.
Zawinski's comments, though entertainingly glib, are fundamentally a display of ignorance and writing Regular Expressions is not the whole of coding so I wouldn't worry about those quotes. Nobody ever got the whole of an argument into a one-liner anyways.
If you came across a Regular Expression that was too complicated to understand even with comments, then probably a regex wasn't a good solution for that particular problem, but that doesn't mean they have no use. I'd be willing to bet that if you've deliberately avoided them, there will be places in your codebase where you have many lines of code and a single, simple, Regex would have done the same job.
Regexbuddy is a useful shortcut, to make sure that the regular expressions you are writing do what you expect- it certainly makes life easier, but it's the matter of using them at all that is what seems important to me about your question.
Like others have said, I think using or not using such a tool is a neutral issue. More to the point: If a regular expression is so complicated that it needs inline comments, it is too complicated. I never comment my regexps. I approach large or complex matching problems by breaking it down into several steps of matching, either with multiple match statements (=~), or by building up a regexp with sub regexps.
Having said all that, I think any developer worth his salt should be reasonably proficient in regular expression writing and reading. I've been using regular expressions for years and have never encountered a time where I needed to write or read one that was terrifically complex. But a moderately sized one may be the most elegant and concise way to do a validation or match, and regexps should not be shied away from only because an inexperienced developer may not be able to read it -- better to educate that developer.
What you should be doing is getting your other devs hooked up with RB.
Don't worry about that whole "2 probs" quote; it seems that may have been a blast on Perl (said back in 1997) not regex.
I prefer not to use regex tools. If I can't write it by hand, then it means the output of the tool is something I don't understand and thus can't maintain. I'd much rather spend the time reading up on some regex feature than learning the regex tool. I don't understand the attitude of many programmers that regexes are a black art to be avoided/insulated from. It's just another programming language to be learned.
It's entirely possible that a regex tool would save me some time implementing regex features that I do know, but I doubt it... I can type pretty fast, and if you understand the syntax well (using a text editor where regexes are idiomatic really helps -- I use gVim), most regexes really aren't that complex. I think you're nearly always better served by learning a technology better rather than learning a crutch, unless the tool is something where you can put in simple info and get out a lot of boilerplate code.
Well, it sounds like the cure for that is for some smart person to introduce a regex tool that annotates itself as it matches. That would suggest that using a tool is not as much the issue as whether there is a big gap between what the tool understands and what the programmer understands.
So, documentation can help.
This is a real trivial example is a table like the following (just a suggestion)
Expression Match Reason
^ Pos 0 Start of input
\s+ " " At least one space
(abs|floor|ceil) ceil One of "abs", "floor", or "ceil"
...
I see the issue, though. You probably want to discourage people from building more complex regular expression than they can parse. I think standards can address this, by always requiring expanded REs and check that the annotation is proper.
However, if they just want to debug an RE, to make sure it's acting as they think it's acting, then it's not really much different from writing code you have to debug.
It's relative.
A couple of regex tools (for Node/JS, PHP and Python) i made (for some other projects) are available online to play and experiment.
regex-analyzer and regex-composer
github repo