SPARQL 1.1: how to use the replace function? - replace

How can one use the replace function in SPARQL 1.1, especially in update commands?
For example, if I have a number of triples ?s ?p ?o where ?o is a string and for all triples where ?o contains the string "gotit" I want to insert an additional triple where "gotit" is replaced by "haveit", how could I do this? I am trying to achieve this is Sesame 2.6.0.
I tried this naive approach:
INSERT { ?s ?p replace(?o,"gotit","haveit","i") . }
WHERE { ?s ?p ?o . FILTER(regex(?o,"gotit","i")) }
but this caused a syntax error.
I also failed to use replace in the result list of a query like so:
SELECT ?s ?p (replace(?o,"gotit","haveit","i") as ?r) WHERE { .... }
The SPARQL document unfortunately does not contain an example of how to use this function.
Is it possible at all to use functions to create new values and not just test existing values and if yes, how?

You can't use an expression directly in your INSERT clause like you have attempted to do. Also you are binding ?name with the first triple pattern but then filtering on ?o in the FILTER which is not going to give you any results (filtering on an unbound variable will give you no results for most filter expressions).
Instead you need to use a BIND in your WHERE clause to make the new version of the value available in the INSERT clause like so:
INSERT
{
?s ?p ?o2 .
}
WHERE
{
?s ?p ?o .
FILTER(REGEX(?o, "gotit", "i"))
BIND(REPLACE(?o, "gotit", "haveit", "i") AS ?o2)
}
BIND assigns the result of an expression to a new variable so you can use that value elsewhere in your query/update.
The relevant part of the SPARQL specification you are interested in is the section on Assignment

The usage of replace looks correct afaict according to the spec. I believe REPLACE was just added to the last rev of the spec relatively recently - perhaps Sesame just doesn't support it yet?
If you just do SELECT ?s ?p ?o WHERE { ?s ?p ?name . FILTER(regex(?name,"gotit","i")) } does your query return rows?

Related

Matching double quotes in SPARQL query in Virtuoso

I need to get a SPARQL query that matches double quotes in Virtuoso graph. I use such query:
SELECT distinct ?o
FROM <http://graph>
WHERE
{
?s ?p ?o.
}
It returns me a column with such values:
http://some.prefix/Symbol
"abcd"
I need to match only second value ("abcd"). I tried to add such filter to WHERE clause:
FILTER regex(str(?o), "\"")
But it returns no results. I also tried '"' as a second parameter to regex, and some other things. Is it possible at all?
"abcd" is a literal of four characters. It does not include the ""; these are the string delimiters and do not form part of the string.
FILTER isLiteral(?o)
should work.
Since your filter is not working, "abcd" does not have a double quote in it. It's a string literal. Not sure what type it is; so you can use --
select ?type where { "abcd" a ?type }
-- to get its type. You can then use that type as a filter in your query as:
SELECT distinct ?o
FROM <http://graph>
WHERE
{
?s ?p ?o .
?o a <whatever type you received in the previous query> .
}

Is there a database that can store regex as values?

I am looking for a database that can store regex expressions as values. E.g. somthing like this:
{:name => "Tim", :count => 3, :expression => /t+/},
{:name => "Rob", :count => 4, :expression => /a\d+/},
{:name => "Fil", :count => 1, :expression => /tt/},
{:name => "Marc", :count => 1, :expression => /bb/}
So I could return rows/documents based on whether the query matches the expression or not (e.g."FIND rows WHERE "tt" =~ :expression"). And get Tim and Fil rows as the result. Most databases can do the exactly opposite thing (check whether a text field matches a regex query). But neither mongo nor postgres can do the opposite thing, unfortunately.
P.S. Or perhaps I am wrong and there are some extensions for postgres or mongo that allow me to store regex?
MongoDB will allow you to store actual regular expressions (i.e. not a string representing a regular expression), as shown below:
> db.mycoll.insertOne({myregex: /aa/})
{
"acknowledged" : true,
"insertedId" : ObjectId("5826414249bf0898c1059b38")
}
> db.mycoll.insertOne({myregex: /a+/})
{
"acknowledged" : true,
"insertedId" : ObjectId("5826414949bf0898c1059b39")
}
> db.mycoll.find()
{ "_id" : ObjectId("5826414249bf0898c1059b38"), "myregex" : /aa/ }
{ "_id" : ObjectId("5826414949bf0898c1059b39"), "myregex" : /a+/ }
You can use this to then query for rows with a regex that matches a query, as follows:
> db.mycoll.find(function() { return this.myregex.test('a'); } )
{ "_id" : ObjectId("5826414949bf0898c1059b39"), "myregex" : /a+/ }
Here we search for rows where the string 'a' is matched by the myregex field, resulting in the second document, with regex /a+/, being returned.
Oracle database can do that.
Example query: WHERE REGEXP_LIKE(first_name, '^Ste(v|ph)en$')
You want to select an regexp from a column, See SQL Fiddle example below for an example.
SQL Fiddle
Choose Oracle database.
In schema window execute the following:
CREATE TABLE regexp (name VARCHAR2(20), count NUMBER, regexp VARCHAR2(50));
INSERT INTO regexp VALUES ('Tim', 3, 't+');
INSERT INTO regexp VALUES ('Rob', 4, 'a\d+');
INSERT INTO regexp VALUES ('Fil', 1, 'tt');
INSERT INTO regexp VALUES ('Marc', 1, 'bb');
COMMIT;
Execute an SQL statement, e.g. (as you mentioned in your question):
SELECT * FROM regexp WHERE REGEXP_LIKE('tt', regexp);
Yields:
NAME COUNT REGEXP
Tim 3 t+
Fil 1 tt
Reference here.
Excerpt:
Oracle Database implements regular expression support with a set of
Oracle Database SQL functions and conditions that enable you to search
and manipulate string data. You can use these functions in any
environment that supports Oracle Database SQL. You can use these
functions on a text literal, bind variable, or any column that holds
character data such as CHAR, NCHAR, CLOB, NCLOB, NVARCHAR2, and
VARCHAR2 (but not LONG).
And some more info to consider:
A string literal in a REGEXP function or condition conforms to the
rules of SQL text literals. By default, regular expressions must be
enclosed in single quotes. If your regular expression includes the
single quote character, then enter two single quotation marks to
represent one single quotation mark within the expression. This
technique ensures that the entire expression is interpreted by the SQL
function and improves the readability of your code. You can also use
the q-quote syntax to define your own character to terminate a text
literal. For example, you could delimit your regular expression with
the pound sign (#) and then use a single quote within the expression.
Note: If your expression comes from a column or a bind variable, then
the same rules for quoting do not apply.
Note there is no column type named RegEx, you would need to save the string as is, in a textual column.
Also you can use RegEx in constraint checking and when you project columns.
SQL Server (and probably some other SQL databases) supports this out of the box, though as has been noted before, this can only be executed by the database as a table scan -- something to keep in mind if you have large numbers of regexes. You just reverse the usual order of the LIKE operator:
create table demo.query
(
id int identity not null,
regex nvarchar(max),
primary key(id)
);
insert into demo.query (regex) values ('aa%');
select * from demo.query where 'aaaa' like regex;
Looks a little funny, but it's perfectly valid.
Adding to Ely's answer, thought of letting you all know that MySQL also supports this.
In http://sqlfiddle.com/, I tested with MySQL 5.6
Build schema:
CREATE TABLE rule (name VARCHAR(20), tot INT, exp VARCHAR(50));
INSERT INTO rule VALUES ('Tim', 3, 't+');
INSERT INTO rule VALUES ('Rob', 4, 'a\d+');
INSERT INTO rule VALUES ('Fil', 1, 'tt');
INSERT INTO rule VALUES ('Jack', 1, '^tt$');
INSERT INTO rule VALUES ('Marc', 1, 'bb');
COMMIT;
Test:
select * from rule where 'ttt' RLIKE exp ;
Expected: rows for Tim, and Fil

Extract triples containing particular substring using SPARQL

I want to extract a triple which contains word say "alice" in its subject. The query I used was:
SELECT ?s ?p ?o WHERE { ?s ?p ?o .FILTER regex(?s, \"alice\") .}
This doesn't give me any results inspite of have a triple which satisfies this constraint.
On the other hand when I use the same query to extract a triple which contains a word brillant in its object .It returns only one of the 2 possible matches.
The query used is:
SELECT ?s ?p ?o WHERE { ?s ?p ?o .FILTER regex(?o, \"brillant\") .}
Please let me know where am I going wrong and what is the reason for this behaviour.
I'll assume that the escapes around the quotation marks are just a remnant from copying and pasting. The first argument to regex must be a literal, but literals cannot be the subjects of triples in RDF, so it's not true that you have data that should match this pattern. What you might have, though, is subjects whose URI contains the string "alice", and you can get the string representation of the URI using the str function. E.g.,
SELECT ?s ?p ?o WHERE { ?s ?p ?o .FILTER regex(str(?s), "alice") .}
To illustrate, let's use the two values <http://example.org> and "string containing example" and filter as you did in your original query:
select ?x where {
values ?x { <http://example.org> "string containing example" }
filter( regex(?x, "exam" ))
}
-------------------------------
| x |
===============================
| "string containing example" |
-------------------------------
We only got "string containing example" because the other value wasn't a string, and so wasn't a suitable argument to regex. However, if we add the call to str, then it's the string representation of the URI that regex will consider:
select ?x where {
values ?x { <http://example.org> "string containing example" }
filter( regex(str(?x), "exam" ))
}
-------------------------------
| x |
===============================
| <http://example.org> |
| "string containing example" |
-------------------------------

regex regarding alternating parts

I have a log entry and I just want to extract the SQL Statement with a regular expression.
The SQL Statement by be any DDL or DML statement and may have several lines.
The Params section may be missing and the "Got xx Results in xx Ticks" may also be missing. But the ":SQLEnd:" line is always there.
Here are some examples
SELECT col1, col2 FROM table WHERE col1 = :id and col2= :num ORDER BY ORDERID ASC
Params:
:id -> 60081
:num-> 1
Got 2 Results in 0 Ticks
:SQLEnd:
or:
SELECT col1, col2 FROM table WHERE col1 = :id and col2= :num ORDER BY ORDERID ASC
Got 2 Results in 0 Ticks
:SQLEnd:
or:
SELECT col1, col2 FROM table WHERE col1 = :id and col2= :num ORDER BY ORDERID ASC
Params:
:id -> 60081
:num-> 1
:SQLEnd:
or:
SELECT col1, col2 FROM table WHERE col1 = :id and col2= :num ORDER BY ORDERID ASC
:SQLEnd:
As Mark Thalman mentioned in the comments you will probably want to research a good parser for this, there are a variety available that should cover any language you should use.
http://code.google.com/p/python-sqlparse/ (for instance) is a good example of a Python SQL Parser
To answer your question I would use a regular expression:
'(s?)(m?)(.*?^:SQLEnd:)'
This will match ANY DDL/DML statement but it will do so crudely (explained below).
The flags at the beginning denote DOTALL (dot takes all characters) and MULTILINE ($ and ^ denote the end and start of lines). Most languages will have built in flags that you can activate using whatever regex class they implement. (i.e. Python re.DOTALL and re.MULTILINE from import re.
Please note that this regular expression will only get expressions between instances of ":SQLEnd:" - so
SELECT col1, col2 FROM table WHERE col1 = :id and col2= :num ORDER BY ORDERID ASC
Params:
:id -> 60081
:num-> 1
Got 2 Results in 0 Ticks
:SQLEnd:
will be one group and
or:
SELECT col1, col2 FROM table WHERE col1 = :id and col2= :num ORDER BY ORDERID ASC
Got 2 Results in 0 Ticks
:SQLEnd:
will be another. This is crude (but the only way I can think of to easily account for any DDL/DML statement) but hopefully will work for you. If this is not an option I highly recommend an SQL parser.
A basic regex to match your examples would be:
SELECT .+?:SQLEnd:
You need to ensure that . will match newline characters. In PHP, this would be:
/SELECT .+?:SQLEnd:/s
However, this regex is not very robust, as it could break when used with certain SQL queries (eg: queries which contain one or more SELECT subqueries). And you want to match "any DDL or DML statement", which would be very complex with a regex. As Mark says, it would be better to use a parser rather than a regex.
Edit
In C#.net, you would use:
new Regex("SELECT .+?:SQLEnd:", RegexOptions.Singleline);
The documentation for RegexOptions.Singleline is here:
Specifies single-line mode. Changes the meaning of the dot (.) so it matches every character (instead of every character except \n).
You can also use this inline option to enable single-line mode:
new Regex("(?s)SELECT .+?:SQLEnd:");

SPARQL regex filter

I'm trying to match one word in SPARQL by using regex filter, but without success... :/
I'm sending the query to the endpoint located at "http://dbtune.org/musicbrainz/sparql".
Well, the following query works:
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX mo: <http://purl.org/ontology/mo/>
SELECT ?artist ?name
WHERE {
?artist a mo:MusicArtist
. ?artist foaf:name "Switchfoot"
. ?artist foaf:name ?name
. FILTER(regex(str(?name), "switchfoot", "i"))
}
But, if I remove the line 7 (. ?artist foaf:name "Switchfoot"), the following query does not match:
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX mo: <http://purl.org/ontology/mo/>
SELECT ?artist ?name
WHERE {
?artist a mo:MusicArtist
. ?artist foaf:name ?name
. FILTER(regex(str(?name), "switchfoot", "i"))
}
I don't know if I am doing something wrongly or it's a bug of endpoint...
Can somebody help me?
In your second query, there's no graph pattern to index against. The only way the query processor can satisfy that query is to retrieve the name of every single artist in the triple store, and then apply a regular expression match to each one. It's no wonder you're hitting some sort of resource limit, whether that's CPU time or elapsed time.
If you want to do free text searches like that, I would suggest downloading the dataset to a local endpoint, and using a free-text index such as LARQ. Your queries will be faster and your users will thank you for it!