How to implement Regex - c++

I'm working on a database server software product (see my profile) and we see the need to implement free- text searching in our software. The query language standard we are using only supports free-text search using a BT type Regex. The only way we can use our free-text database indexes together with Regex seems to be to implement our own. My questions to SO is:
Where can I find papers/examples/patterns on how to implement a BT style Regex?
Is it worth looking into taking one of the open source C/C++ Regex libraries and altering the code to fit our needs?

If I'm not wrong SPARQL uses the XPath/XQuery regular expression syntax which is based on PERL regular expressions (At least that is what the W3C docs say)
If this is indeed the case then you can use PCRE from http://www.pcre.org/
It is licensed as BSD so you will be able to use it in a commercial product
If your syntax is slightly modified you can probably write a small routine to normalize it to the PERL syntax used by PCRE

There are two papers I have found on the subject on REGEX indexing online; one from Bell Labs and one from UCLA/IBM. I'm still not sure if to use an existing Regex library and modify it or write one from scratch.

Related

What regular expression engine does XCode's Find Navigator use

The documentation from Apple, Google searches, and StackOverflow is non-existent with regard to what RegEx engine XCode's Find Navigator uses, making it very difficult to write expressions. For example, I tried to do a negative lookbehind, and it doesn't work. Does anyone know where the hidden documentation lies?
Quick Answer: The ICU Engine. (This is the same engine that is used for NSRegularExpression as well).
Longer version:
The main difference between PCRE and ICU, the two leading regex engines, is PCRE's support for recursion within it's expressions. A simple example of this would be:
\w{3}\d{3}(?R)?
which matches aaa111bbb222 fully. However, entering this into an Xcode (7.1.1) regex find field yields this alert:
This leads us to believe that Xcode is using the ICU engine to power it's search and replace features. This make sense since Xcode is likely built either on top of NSRegularExpression or the underlying libicucore.

RegEx with Excel VBA on Mac

I need to use regEx with Excel VBA. I'm using Mac OS 10.10 and Office 2011. So there is no DLL file I can use.
What is there to do here?
I read I've to bind an apple script. How is this done and what content does this script need?
You can use VBA's Like operator. It's a very limited regex tester only.
Microsoft Word has it's standard wildcards plus if you tick Use Wildcards it is a Regex engine (plus find words that sound the same, and words with the same root). So use Word rather than Vbscript's RegEx.
Just record a Find and Replace in Word and you'll get most of the program written for you that you'll just need to adapt.
Natively, you can't really - AppleScript isn't actually that good for this kind of thing (where VBA is concerned)
There are other libraries that you can install and use to allow support for things like regular expressions on Mac OS - the one I've seen used the most is Satimage although I've not personally had to use it (yet) so can't vouch for it myself:
http://www.satimage.fr/software/en/downloads/downloads_companion_osaxen.html
I'm working on this problem too and I think Advanced Filters may be your answer if you want to do it in Excel without adding an external library. You can access it through VBA and set up a hidden sheet somewhere to stash your filters.
https://searchengineland.com/advanced-filters-excels-amazing-alternative-to-regex-143680
And you can see what it looks like in VBA here:
https://www.contextures.com/exceladvancedfiltervba.html
However, Advanced Filters does have some notable shortcomings, like the inability to distinguish a digit from a letter. The LIKE command mentioned earlier DOES have this ability however - so you could combine them to overcome that limitation.
Hopefully you and I can both solve this problem using these tools...!

Can regular expressions be used in Protege OWL query?

I have an OWL class Verse wich has a data property named hasContent. Property's range is string. Using DL query, e.g. Verse and hasContent "complete text of a verse", I can find the verse that contains the specified text. I now want to find all intances of verses that contain some word.
Can regular expressions be used in Protege OWL query? Is there any example? Or I need to use the more complicated query language, SPARQL?
You can use XSD facets directly within the OWL Manchester syntax (the syntax what you use in Protege). With a facet you can achieve some of the things you can do with a regex, via the pattern construct. The implementation is reasoner-specific, some it might work sometimes and sometimes not :-/
s an
Some links to learn more about it:
The answers to restrict xsd:string to [A-Z] for rdfs:range contain examples of facet use.
Facets specs.
Or SPARQL as suggested by other answers.
SPARQL is the query language for RDF and there is a reason for it. If you use plain regex (withou SPARQL) you would not be able to define your target (instances, classes, properties etc) and you would not exploit the benefits of using an ontology. Regular expressions are fine for plain texts, but an ontology is not a plain text and you shouldn't handle it as such. I would strongly suggest using SPARQL, which already has included regular expressions when it comes to restricting string values.
Another solution, (I would not by anyway suggest it) is to export your target ontology as an RDF/XML document and apply regular expressions search on it as if it was a simple document.
Hope I helped!

How to verify regexp patterns?

What are the common ways to verify the given regex pattern works well of the given scenario and check the results ?
I would like to know in general , not in the particular programming language and what is the best way to learn about writing regular expression ?
Books: Mastering Regular Expressions is the definitive guide to regular expressions. The Regular Expressions Cookbook is said to be lighter and more easily applicable.
Sites: Friedel's companion site is a good start. Regexlib is a source of idioms and patterns.
Software: RegexBuddy is a good, per pay, regex verifier.
I've used this resource when learning: http://www.regular-expressions.info/ and found myself going back there whenever there was something I needed to remember. It's very useful for learning and covers the basics very well. They also have various links to programs which can be used to verify regular expressions.
This is not a "real" verification, but RegexBuddy allows you to verify that your regex does what you expect it to do on any sample data you provide. It also translates the regex into an English description that can help to figure out mistakes. Plus, it knows all major regex flavors and can translate regexes between them.
For testing regular expression you can use RegEx Test tools like one below :
http://www.regextester.com/
To know more about how to learn regular expressions please check following SO threads :
Learning Regular Expressions
How to master Regular Expressions?
https://stackoverflow.com/questions/465119/how-do-i-learn-regular-expressions-closed
RAD Rexexp designer is a great tool
Set up an automated test using your tools of choice (because regex implementations vary from language to language and library to library) which applies the regex to a variety of both matching and non-matching inputs to verify that you get the correct results.
While RegexBuddy and the like may be helpful for initially creating the regex (or may not; I've never used them), you will still need to maintain it, just like any other code. When that time comes, it's vastly preferable to have a test script that will run through all your old test inputs (plus the new ones which created the need for the change) in a matter of seconds rather than having to sit on a website for tens of minutes, if not hours, trying to remember all your test inputs and manually re-run them to make sure you didn't break anything.

Under what situations are regular expressions really the best way to solve the problem?

I'm not sure if Jeff coined it but it's the joke/saying that people who say "oh, I know I'll use regular expressions!" now have two problems. I've always taken this to mean that people use regular expressions in very inappropriate contexts.
However, under what circumstances are regular expressions really the best answer? What problems are they really the best or maybe only way to solve a situation?
RexExprs are good for:
Text Format Validations (email, url, numbers)
Text searchs/substitution.
Mappings (e.g. url pattern to function call)
Filtering some texts (related to substitution)
Lexical analysis during parsing.
They can be used to validate anything that have a pattern like :
Social Security Number
Telephone Number ( 555-555-5555 )
Email Address (something#example.com)
IP Address (but it's more complex to make sure it's valid)
All those have patterns and are easily verifiable by RegEx.
They are difficultly used for entry that have a logic instead of a pattern like a credit card number but they still can be used to do some client validation.
So the best ways?
To sanitize data entry on the client
side before sanitizing them on the
server.
To make "Search and Replace" of some
strings that contains pattern
I'm sure I am missing a lot of other cases.
Regular expressions are a great way to parse text that doesn't already have a parser (i.e. XML) I have used it to create a parser for the mod_rewrite syntax in the .htaccess file or in my URL Rewriter project http://www.codeplex.com/urlrewriter for example
they are really good when you want to be more specific than "*" or "?" like "3 letters then 2 numbers then a $ sign then a period"
The quote is from an anti-Perl rant from Jamie Zawinski. I think Perl used to do regex really badly but now it seems to be a standard engine for a lot of programs.
But the same sentiment still applies. If you don't know how to use regex, you better not try something real fancy other wise you get one of these tags too (see bronze list) ;o)
https://stackoverflow.com/users/730/keng
They are good for matching or finding text that takes a very specific and simple format. By "simple" I mean not nested and smaller than the entire html spec, for example.
They are primarily of value for highly structured text parsing. If you used named groups (and option in most mature regex systems), you have a phenomenally powerful and crisp way to handle the strings.
Here's an example. Consider that netstat in its various iterations on different linux OSes, and versions of netstat can return different results. Sometimes there is an extra column, sometimes there is a shift if the date/time format. Regexes give you a powerful way to handle that with a single expression. Couple that with named groups, and you can retrieve the data without hacks like:
1) split on spaces
2) ok, the netstat version is X so add I need to add 1 to all array references past column 5.
3) ok, the netstat version is Y so I need to make sure that I use multiple array references for the date info.
YUCK. Simple to fix in a Regex :-)