Can regular expressions be used in Protege OWL query? - regex

I have an OWL class Verse wich has a data property named hasContent. Property's range is string. Using DL query, e.g. Verse and hasContent "complete text of a verse", I can find the verse that contains the specified text. I now want to find all intances of verses that contain some word.
Can regular expressions be used in Protege OWL query? Is there any example? Or I need to use the more complicated query language, SPARQL?

You can use XSD facets directly within the OWL Manchester syntax (the syntax what you use in Protege). With a facet you can achieve some of the things you can do with a regex, via the pattern construct. The implementation is reasoner-specific, some it might work sometimes and sometimes not :-/
s an
Some links to learn more about it:
The answers to restrict xsd:string to [A-Z] for rdfs:range contain examples of facet use.
Facets specs.
Or SPARQL as suggested by other answers.

SPARQL is the query language for RDF and there is a reason for it. If you use plain regex (withou SPARQL) you would not be able to define your target (instances, classes, properties etc) and you would not exploit the benefits of using an ontology. Regular expressions are fine for plain texts, but an ontology is not a plain text and you shouldn't handle it as such. I would strongly suggest using SPARQL, which already has included regular expressions when it comes to restricting string values.
Another solution, (I would not by anyway suggest it) is to export your target ontology as an RDF/XML document and apply regular expressions search on it as if it was a simple document.
Hope I helped!

Related

Regex force group order

I'm new in regex and I have a question.
Like in this example, https://regex101.com/r/Iak7cF/1/ how do I force
src="wow"
to be in group 1, and
title="toto"
to be in group 2?
I want to capture this kind of text in any order only if it contains:
class="formula"
Am I doing it right?
You'd better use an HTML parser
But if you really want to use regex, you have to use named groups to achieve what you want.
<img(?=[^>]*class="formula")(?=.*(?<src>src=".*"))(?=.*(?<title>title=".*")).*>
DEMO
Regular expressions are very flexible and powerful, but in general, they are not the right tool for parsing XML, HTML, or XHTML. From WinBatch:
Regular Expressions are only good for parsing text that is tightly defined. Since Regular Expressions don't really understand the context of matches, they can be fooled in a big way if the structure of the text changes. In particular, Regular Expressions have difficulty with hierarchy.
PerlMonks has a detailed explanation of why regex is not a good solution for all but the most simple of casess. They summarize it like this:
So I hope it is clear: Please, don't try to parse arbitrary XML/HTML with regexes!

Can xPath in LibXML be regex type

We usually write our Search Path in findnodes() function as below
//parentNode[subNode/text() = 'CPUUSAGE']/subNode
what is I want to match a part of the text here and find all the nodes?
something like
//parentNode[subNode/text() =~ '/CPUUSAGE'/]/subNode
Obviously this is Invalid xPath...
Any thoughts how to achieve this?
I know I can first find the nodes and then try to match the textContent. But Can we do that in one shot directly in findnodes()?
XPath 1.0 (which libxml implements) doesn’t include any built-in support for regular expressions. In the example you give, which uses a fairly simple regular expression, you could use the contains function to achieve a similar result:
//parentNode[subNode[contains(text(), 'CPUUSAGE')]]/subNode
(As an aside that’s an odd expression – you’d probably really want something like //parentNode/subNode[contains(text(), 'CPUUSAGE')] but I realise it’s just an example.)
There are some other string functions that could be useful in creating other simple queries.
You could create your own custom XPath function to filter nodes based on a regular expression, in fact the docs for the Perl LibXML module includes an example of doing just that.
XPath 2.0 does have support for using regular expressions with a group of string functions. Unless you have an XPath 2.0 processor that will not be too useful.
XML::Twig has support for regular expressions in its xpaths.
The following is an xpath that I used in an answer to this SO question: Updating xml attribute value based on other with Perl
project[string(path) =~ /\bopensource\b/]/revision
I also created a second answer so that I could experiment with how XML::LibXML could be used to solve the same problem, and in that case I just iterated over all projects and did the regex filtering manually.

How to implement Regex

I'm working on a database server software product (see my profile) and we see the need to implement free- text searching in our software. The query language standard we are using only supports free-text search using a BT type Regex. The only way we can use our free-text database indexes together with Regex seems to be to implement our own. My questions to SO is:
Where can I find papers/examples/patterns on how to implement a BT style Regex?
Is it worth looking into taking one of the open source C/C++ Regex libraries and altering the code to fit our needs?
If I'm not wrong SPARQL uses the XPath/XQuery regular expression syntax which is based on PERL regular expressions (At least that is what the W3C docs say)
If this is indeed the case then you can use PCRE from http://www.pcre.org/
It is licensed as BSD so you will be able to use it in a commercial product
If your syntax is slightly modified you can probably write a small routine to normalize it to the PERL syntax used by PCRE
There are two papers I have found on the subject on REGEX indexing online; one from Bell Labs and one from UCLA/IBM. I'm still not sure if to use an existing Regex library and modify it or write one from scratch.

Regex and non-technical users

Given that:
You have some Key-Value data that can be modified
Modification is done via by applying filters to the data. Filters that control what gets changed are created by non-technical people
The filters are setup using regular expressions. An example of a rule described as part of a filter may be "If a key matches some regex, replace value with some other value"
How would you:
If filters are to be produced by business people, who can't create regular expressions, in what form would you have them submit their input that would be easily translated to regex?
Agent Ransack contains a GUI editor for creating regular expressions from plain English, I would suggest taking a look at that and implementing your own variation of it.
See the screenshot for an example:
If it works, I would go for "wildcard only" support - ie the asterisk * is the only special character allowed and you translate that to the regex .*? in code.
Most non-technical people can grasp * meaning "anything".

regular expression to convert state names to abbreviations

I'm working on a project that requires only the use of regular expression to convert state names (must be case insensitive) to their two letter abbreviations.
I cannot use any sort of development environment or link to any databases or xml or ini files.
Please help!
Since states don't have something regular in them regular expressions is the WRONG tool. I would suggest getting a new project.
However, the only solution (apart from stupid illogical hacks) is to hardcore every state:
s/Alabama/Al/
s/Alaska/Ak/
...
s/Wyoming/Wy/
A list of the states and their abbreviations can be found here.