How to verify regexp patterns? - regex

What are the common ways to verify the given regex pattern works well of the given scenario and check the results ?
I would like to know in general , not in the particular programming language and what is the best way to learn about writing regular expression ?

Books: Mastering Regular Expressions is the definitive guide to regular expressions. The Regular Expressions Cookbook is said to be lighter and more easily applicable.
Sites: Friedel's companion site is a good start. Regexlib is a source of idioms and patterns.
Software: RegexBuddy is a good, per pay, regex verifier.

I've used this resource when learning: http://www.regular-expressions.info/ and found myself going back there whenever there was something I needed to remember. It's very useful for learning and covers the basics very well. They also have various links to programs which can be used to verify regular expressions.

This is not a "real" verification, but RegexBuddy allows you to verify that your regex does what you expect it to do on any sample data you provide. It also translates the regex into an English description that can help to figure out mistakes. Plus, it knows all major regex flavors and can translate regexes between them.

For testing regular expression you can use RegEx Test tools like one below :
http://www.regextester.com/
To know more about how to learn regular expressions please check following SO threads :
Learning Regular Expressions
How to master Regular Expressions?
https://stackoverflow.com/questions/465119/how-do-i-learn-regular-expressions-closed

RAD Rexexp designer is a great tool

Set up an automated test using your tools of choice (because regex implementations vary from language to language and library to library) which applies the regex to a variety of both matching and non-matching inputs to verify that you get the correct results.
While RegexBuddy and the like may be helpful for initially creating the regex (or may not; I've never used them), you will still need to maintain it, just like any other code. When that time comes, it's vastly preferable to have a test script that will run through all your old test inputs (plus the new ones which created the need for the change) in a matter of seconds rather than having to sit on a website for tens of minutes, if not hours, trying to remember all your test inputs and manually re-run them to make sure you didn't break anything.

Related

Is there a function to create a regex pattern from a string input?

I'm lousy at regular expressions but occasionally they're the only thing that's the right solution for a problem.
Is there something in the .NET framework that allows you to input an unencoded string and get a pattern from it? Which you could then modify as required?
e.g. I want to remove a CDATA section that contains a file from some XML but I can't work out what the right pattern is for <![CDATA[hugepileofrandombinarydataherethatalsoneedstogo]]> and I don't want to ask for help each time I'm stuck on a regex pattern.
Such tools exist, google by "regex generator".
But, as suggested in comments, better learn regex. Simple patterns are easy. Something like <!\[.*?]]>
in your case.
There are Regex Design tools like expresso...
http://www.ultrapico.com/expresso.htm
It's not perfect but as there is no suitable .Net component the text to regex page at txt2re.com is the best I've seen for those people who occasionally need to build a regex to match a string but don't have the time to relearn regex each time they want to use one.

How to implement Regex

I'm working on a database server software product (see my profile) and we see the need to implement free- text searching in our software. The query language standard we are using only supports free-text search using a BT type Regex. The only way we can use our free-text database indexes together with Regex seems to be to implement our own. My questions to SO is:
Where can I find papers/examples/patterns on how to implement a BT style Regex?
Is it worth looking into taking one of the open source C/C++ Regex libraries and altering the code to fit our needs?
If I'm not wrong SPARQL uses the XPath/XQuery regular expression syntax which is based on PERL regular expressions (At least that is what the W3C docs say)
If this is indeed the case then you can use PCRE from http://www.pcre.org/
It is licensed as BSD so you will be able to use it in a commercial product
If your syntax is slightly modified you can probably write a small routine to normalize it to the PERL syntax used by PCRE
There are two papers I have found on the subject on REGEX indexing online; one from Bell Labs and one from UCLA/IBM. I'm still not sure if to use an existing Regex library and modify it or write one from scratch.

easy to use Regex creator tool? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 1 year ago.
Improve this question
I tried some regex tools like Regulator, Regulazy & RegexBuddy. They don't do what I want and they expect the user to know regular expressions.
I want a tool for dummies. You tell the tool I need a regex for something like "match anything that ends with the word 'yes' and it contains at least one occurrence of the phrase '/test/'" and it creates the regex for you.
So I either enter my request in plain English or semi plain English or the tool has all kinds of ready made selections and I choose between them to create what I want ad hoc.
Is there such a tool which is geared towards non developers? I am not looking for a regex tester.
I have been a fan of Ultrapico's Expresso application. There is a builder section that helps you (a little) in building fragments of the expression. More importantly it will explain an existing expression (either your own or from the built in expression library) section by section.
It also includes a testing and replacement section to see and test your expressions. Lastly it will generate the expression formatted for either C#, C++, or VB.NET so that you know exactly how to insert the expression into your project.
Best of all it's free. I have been using this tool to help learn how regular expressions actually work, especially the complex ones. Can't say it makes writing expressions idiot proof but it has sure made learning expressions easier for me...
This tool was featured in a MSDN Webcast by Zain Naboulsi, and might be worth a watch. Hope this helps, and good luck with your Regex journey!
txt2re seems really good. I entered "This string contains test in it and ends in yes" into the box, clicked on test and yes, and
got a regular expression. It doesn't focus on teaching how to build regexes, but it shows you the perl code to parse what you need.
The real power of this tool is its ability to recognize things like dates, URLs, and tags. Whitespace didn't seem to work too well, however, and it doesn't appear to handle any sort of repetition.
Personally I really like Expresso http://www.ultrapico.com/Expresso.htm
The interface is quite clean - lets you test out search and replace functionality, has good help, plus it generates the C# expressions for you if you like.
For me, the best tool is RegexBuilder it's open source and writen in C#, so you can customize it as much as you want ;)
Enjoy. ;)
For web-based (one line only): http://txt2re.com/
For offline (Full text file is possible): Expresso
I like my own RegEx Builder best: http://www.linuxintro.org/regex
I am not sure such a tool exists, since they usually do the opposite:
Analyze a regexp and translate it in plain English.
The closest solution to your need would be this C# library, allowing you to program regexp in a semi-readable way:
Instead of this:
const string findGamesPattern =
#"<div\s*class=""game""\s*id=""(?<gameID>\d+)-game""(?<content>.*?)<!--gameStatus\s*=\s*(?<gameState>\d+)-->";
You would have, using ReadableRex:
Pattern findGamesPattern = Pattern.With.Literal(#"<div")
.WhiteSpace.Repeat.ZeroOrMore
.Literal(#"class=""game""").WhiteSpace.Repeat.ZeroOrMore.Literal(#"id=""")
.NamedGroup("gameId", Pattern.With.Digit.Repeat.OneOrMore)
.Literal(#"-game""")
.NamedGroup("content", Pattern.With.Anything.Repeat.Lazy.ZeroOrMore)
.Literal(#"<!--gameStatus")
.WhiteSpace.Repeat.ZeroOrMore.Literal("=").WhiteSpace.Repeat.ZeroOrMore
.NamedGroup("gameState", Pattern.With.Digit.Repeat.OneOrMore)
.Literal("-->");
I was also using Expresso, and found it quite good. The most important thing for me in these tools is validation and not so much visual aids for building expressions. I only need a tool to kind of remind me things, not design them for me.
Anyway, here's another free one, which I quite like. It's called Rad Software Regular Expression Designer. Hope this helps.
Try RegExr a flash based tool
If you are a mac person, try this widget, althought it's also a tester i find it quite useful as it's really easy to get to it while learning regex online
http://mac.softpedia.com/progDownload/Regex-Widget-Download-28467.html
I settled on RegexBuddy. It has a debugger, visualizer and translates the regex into plain English.
Well it's still very much a tool in development, but you could try Textpression at http://www.textpression.com Disclosure: I'm the author.
Textpression enables visual regex creation with a drag and drop editor; no regex syntax to learn! At time of writing Textpression is still in Alpha, but let me know what you think!
Intuitive Regular Expression Analyser and Composer Class (for PHP, Node/JS and Python), regex composer live example here
RegexAnalyser github repository
ps i am the author
As per details in your question, I think Regex Builder Tool is the exact tool you need.
For the regex you asked in question, you would give requirements in simple plain english like _Match_anywhere_in_text_ _zero_or_more_of_ ( _any_character_) _then_ _one_or_more_of_ ( _exact_string_ ( test)) _then_ _zero_or_more_of_ ( _any_character_) _then_ _exact_string_ ( yes) _then_ _end_of_line_
This would automatically generate regex .*test+.*yes$ You can also test generated regex right there using input like I did test, results yes & it would show that generated regex is matching the text as per your expectations.

Under what situations are regular expressions really the best way to solve the problem?

I'm not sure if Jeff coined it but it's the joke/saying that people who say "oh, I know I'll use regular expressions!" now have two problems. I've always taken this to mean that people use regular expressions in very inappropriate contexts.
However, under what circumstances are regular expressions really the best answer? What problems are they really the best or maybe only way to solve a situation?
RexExprs are good for:
Text Format Validations (email, url, numbers)
Text searchs/substitution.
Mappings (e.g. url pattern to function call)
Filtering some texts (related to substitution)
Lexical analysis during parsing.
They can be used to validate anything that have a pattern like :
Social Security Number
Telephone Number ( 555-555-5555 )
Email Address (something#example.com)
IP Address (but it's more complex to make sure it's valid)
All those have patterns and are easily verifiable by RegEx.
They are difficultly used for entry that have a logic instead of a pattern like a credit card number but they still can be used to do some client validation.
So the best ways?
To sanitize data entry on the client
side before sanitizing them on the
server.
To make "Search and Replace" of some
strings that contains pattern
I'm sure I am missing a lot of other cases.
Regular expressions are a great way to parse text that doesn't already have a parser (i.e. XML) I have used it to create a parser for the mod_rewrite syntax in the .htaccess file or in my URL Rewriter project http://www.codeplex.com/urlrewriter for example
they are really good when you want to be more specific than "*" or "?" like "3 letters then 2 numbers then a $ sign then a period"
The quote is from an anti-Perl rant from Jamie Zawinski. I think Perl used to do regex really badly but now it seems to be a standard engine for a lot of programs.
But the same sentiment still applies. If you don't know how to use regex, you better not try something real fancy other wise you get one of these tags too (see bronze list) ;o)
https://stackoverflow.com/users/730/keng
They are good for matching or finding text that takes a very specific and simple format. By "simple" I mean not nested and smaller than the entire html spec, for example.
They are primarily of value for highly structured text parsing. If you used named groups (and option in most mature regex systems), you have a phenomenally powerful and crisp way to handle the strings.
Here's an example. Consider that netstat in its various iterations on different linux OSes, and versions of netstat can return different results. Sometimes there is an extra column, sometimes there is a shift if the date/time format. Regexes give you a powerful way to handle that with a single expression. Couple that with named groups, and you can retrieve the data without hacks like:
1) split on spaces
2) ok, the netstat version is X so add I need to add 1 to all array references past column 5.
3) ok, the netstat version is Y so I need to make sure that I use multiple array references for the date info.
YUCK. Simple to fix in a Regex :-)

How can test I regular expressions using multiple RE engines? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Closed 8 years ago.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
How can I test the same regex against different regular expression engines?
The most powerful free online regexp testing tool is by far http://regex101.com/ - lets you select the RE engine (PCRE, JavaScript, Python), has a debugger, colorizes the matches, explains the regexp on the fly, can create permalinks to the regex playground.
Other online tools:
http://www.rexv.org/ - supports PHP and Perl PCRE, Posix, Python, JavaScript, and Node.js
http://refiddle.com/ - Inspired by jsfiddle, but for regular expressions. Supports JavaScript, Ruby and .NET expressions.
http://regexpal.com/ - powered by the XRegExp JavaScript library
http://www.rubular.com/ - Ruby-based
Perl Regex Tutor - uses PCRE
Windows desktop tools:
The Regex Coach - free Windows application
RegexBuddy recommended by most, costs US$ 39.95
Jeff Atwood [wrote about regular expressions]( post:).
Other tools recommended by SO users include:
http://www.txt2re.com/ Online free tool to generate regular expressions for multiple language (#palmsey another thread)
The Added Bytes Regular Expressions Cheat Sheet (#GateKiller another thread)
http://regexhero.net/ - The Online .NET Regular Expression Tester. Not free.
RegexBuddy
I use Expresso (www.ultrapico.com). It has a lot of nice features for the developer. The Regulator used to be my favorite, but it hasn't been updated in so long and I constantly ran into crashes with complicated RegExs.
Here are some for the Mac: (Note: don't judge the tools by their websites)
RegExhibit - My Favorite, powerful and easy
Reggy - Simple and Clean
RegexWidget - A Dashboard Widget for quick testing
If you are an Emacs user, the command re-builder lets you type an Emacs regex and shows on the fly the matching strings in the current buffer, with colors to mark groups. It's free as Emacs.
Rubular is free, easy to use and looks nice.
RegexBuddy is a weapon of choice
I use the excellent and free Rad Software Regular Expression Designer.
If you just want to write a regular expression, have a little help with the syntax and test the RE's matching and replacing then this fairly light-footprint tool is ideal.
couple of eclipse plugins for those using eclipse,
http://www.brosinski.com/regex/
http://www.bastian-bergerhoff.com/eclipse/features/web/QuickREx/toc.html
Kodos of course. Cause it's Pythonic. ;)
RegexBuddy is great!!!
I agree on RegExBuddy, but if you want free or when I'm working somewhere and not on my own system RegExr is a great online (Flash) tool that has lots of pre-built regex segments to work with and does real-time pattern matching for your testing.
In the standard Python installation there is a "Tools/scripts" directory containing redemo.py.
This creates an interactive Tkinter window in which you can experiment with regexs.
In the past I preferred The Regex Coach for its simplistic layout, instantaneous highlighting and its price (free).
Every once in awhile though I run into an issue with it when trying to test .NET regular expressions. For that, it turns out, it's better to use a tool that actually uses the .NET regular expression engine. That was my whole reason to build Regex Hero last year. It runs in Silverlight, and as such, runs off of the .NET Regex Class library directly.
Regexbuddy does all this. http://www.regexbuddy.com/
see the accepted answer to this question: Learning Regular Expressions
I'll add to the vote of Reggy for the Mac, gonna try out some of the other ones that Joseph suggested and upvote that post tomorrow when my limit gets reset.
for online: http://regexpal.com/
for desktop: The Regex Coach
+1 For Regex Coach here. Free and does the job really well.
http://www.weitz.de/regex-coach/
I am still a big The Regulator fan.
There are some stability problems but these can be fixed by disableing the Intellisense. It gets mad with some expressions and typos in building an expression.
Would love it if Roy Osherove updated, but looks like he is busy with other things.
I like to use this online one:
http://www.cuneytyilmaz.com/prog/jrx/
Of course, it'll be javascript regexp, but I've never yet done anything clever enough to notice the difference.
How much is your time worth? Pay the $40 and get RegexBuddy. I did, and I even upgraded from 2.x version to 3.x. It has paid for itself many times over.
I personally like the Regular Expression Tester.
It's a free firefox plugin, so always on!
Also this regex plugin can be useful for eclipse and idea users.
I like http://regexhero.net/tester/ a lot
Check out Regex Master which is free and open source regular expression tester
This regex tester able to test javascript, php and python
http://www.piliapp.com/regex-tester/
RegExBuddy so far I concur with and endorse.
RegExr for testing with the Actionscript 3 (whichever standard that may be)
http://rgx-extract-replace.appspot.com
has the functionality to enlist the captured regex groups formatted in columns and
optionally can replace the matched patterns in the input text.