SPARQL regex filter - regex

I'm trying to match one word in SPARQL by using regex filter, but without success... :/
I'm sending the query to the endpoint located at "http://dbtune.org/musicbrainz/sparql".
Well, the following query works:
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX mo: <http://purl.org/ontology/mo/>
SELECT ?artist ?name
WHERE {
?artist a mo:MusicArtist
. ?artist foaf:name "Switchfoot"
. ?artist foaf:name ?name
. FILTER(regex(str(?name), "switchfoot", "i"))
}
But, if I remove the line 7 (. ?artist foaf:name "Switchfoot"), the following query does not match:
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX mo: <http://purl.org/ontology/mo/>
SELECT ?artist ?name
WHERE {
?artist a mo:MusicArtist
. ?artist foaf:name ?name
. FILTER(regex(str(?name), "switchfoot", "i"))
}
I don't know if I am doing something wrongly or it's a bug of endpoint...
Can somebody help me?

In your second query, there's no graph pattern to index against. The only way the query processor can satisfy that query is to retrieve the name of every single artist in the triple store, and then apply a regular expression match to each one. It's no wonder you're hitting some sort of resource limit, whether that's CPU time or elapsed time.
If you want to do free text searches like that, I would suggest downloading the dataset to a local endpoint, and using a free-text index such as LARQ. Your queries will be faster and your users will thank you for it!

Related

Adding nulls to dataframe output with regexp replace in Spark 2.4

I am trying to use regex replace to add a string "null" to the output. Language is Spark Scala 2.40 in aws glue. What is the best approach for this problem?
I am creating a dataframe by dataframe select and parsing through the columns that I need to add "null" to:
var select_df = raw_df.select(
col("example_column_1"),
col("example_column_2"),
col("example_column_3")
)
Input of example_column_1
#;#;Runner#;#;bob
Desired Output of example_column_1
null#;null#;Runner#;null#;bob
Attempt:
select_df.withColumn("example_column_1", regexp_replace(col("example_column_1"), "", "null"))
The task can be split into two parts:
replace # at the beginning of the string
replace all occurences of ;#
select_df
.withColumn("example_column_1", regexp_replace('example_column_1, "^#", "null#"))
.withColumn("example_column_1", regexp_replace('example_column_1, ";#", ";null#"))
.show(false)

Matching double quotes in SPARQL query in Virtuoso

I need to get a SPARQL query that matches double quotes in Virtuoso graph. I use such query:
SELECT distinct ?o
FROM <http://graph>
WHERE
{
?s ?p ?o.
}
It returns me a column with such values:
http://some.prefix/Symbol
"abcd"
I need to match only second value ("abcd"). I tried to add such filter to WHERE clause:
FILTER regex(str(?o), "\"")
But it returns no results. I also tried '"' as a second parameter to regex, and some other things. Is it possible at all?
"abcd" is a literal of four characters. It does not include the ""; these are the string delimiters and do not form part of the string.
FILTER isLiteral(?o)
should work.
Since your filter is not working, "abcd" does not have a double quote in it. It's a string literal. Not sure what type it is; so you can use --
select ?type where { "abcd" a ?type }
-- to get its type. You can then use that type as a filter in your query as:
SELECT distinct ?o
FROM <http://graph>
WHERE
{
?s ?p ?o .
?o a <whatever type you received in the previous query> .
}

Regular Expression in redshift

I have a data which is being fed in the below format -
2016-006-011 04:58:22.058
This is an incorrect date/timestamp format and in order to convert this to a right one as below -
2016-06-11 04:58:22.058
I'm trying to achieve this using regex in redshift. Is there a way to remove the additional Zero(0) in the date and month portion using regex. I need something more generic and not tailed for this example alone as date will vary.
The function regexp_replace() (see documentation) should do the trick:
select
regexp_replace(
'2016-006-011 04:58:22.058' -- use your date column here instead
, '\-0([0-9]{2}\-)0([0-9]{2})' -- matches "-006-011", captures "06-" in $1, "11" in $2
, '-$1$2' -- inserts $1 and $2 to give "-06-11"
)
;
And so the result is, as required:
regexp_replace
-------------------------
2016-06-11 04:58:22.058
(1 row)

regex regarding alternating parts

I have a log entry and I just want to extract the SQL Statement with a regular expression.
The SQL Statement by be any DDL or DML statement and may have several lines.
The Params section may be missing and the "Got xx Results in xx Ticks" may also be missing. But the ":SQLEnd:" line is always there.
Here are some examples
SELECT col1, col2 FROM table WHERE col1 = :id and col2= :num ORDER BY ORDERID ASC
Params:
:id -> 60081
:num-> 1
Got 2 Results in 0 Ticks
:SQLEnd:
or:
SELECT col1, col2 FROM table WHERE col1 = :id and col2= :num ORDER BY ORDERID ASC
Got 2 Results in 0 Ticks
:SQLEnd:
or:
SELECT col1, col2 FROM table WHERE col1 = :id and col2= :num ORDER BY ORDERID ASC
Params:
:id -> 60081
:num-> 1
:SQLEnd:
or:
SELECT col1, col2 FROM table WHERE col1 = :id and col2= :num ORDER BY ORDERID ASC
:SQLEnd:
As Mark Thalman mentioned in the comments you will probably want to research a good parser for this, there are a variety available that should cover any language you should use.
http://code.google.com/p/python-sqlparse/ (for instance) is a good example of a Python SQL Parser
To answer your question I would use a regular expression:
'(s?)(m?)(.*?^:SQLEnd:)'
This will match ANY DDL/DML statement but it will do so crudely (explained below).
The flags at the beginning denote DOTALL (dot takes all characters) and MULTILINE ($ and ^ denote the end and start of lines). Most languages will have built in flags that you can activate using whatever regex class they implement. (i.e. Python re.DOTALL and re.MULTILINE from import re.
Please note that this regular expression will only get expressions between instances of ":SQLEnd:" - so
SELECT col1, col2 FROM table WHERE col1 = :id and col2= :num ORDER BY ORDERID ASC
Params:
:id -> 60081
:num-> 1
Got 2 Results in 0 Ticks
:SQLEnd:
will be one group and
or:
SELECT col1, col2 FROM table WHERE col1 = :id and col2= :num ORDER BY ORDERID ASC
Got 2 Results in 0 Ticks
:SQLEnd:
will be another. This is crude (but the only way I can think of to easily account for any DDL/DML statement) but hopefully will work for you. If this is not an option I highly recommend an SQL parser.
A basic regex to match your examples would be:
SELECT .+?:SQLEnd:
You need to ensure that . will match newline characters. In PHP, this would be:
/SELECT .+?:SQLEnd:/s
However, this regex is not very robust, as it could break when used with certain SQL queries (eg: queries which contain one or more SELECT subqueries). And you want to match "any DDL or DML statement", which would be very complex with a regex. As Mark says, it would be better to use a parser rather than a regex.
Edit
In C#.net, you would use:
new Regex("SELECT .+?:SQLEnd:", RegexOptions.Singleline);
The documentation for RegexOptions.Singleline is here:
Specifies single-line mode. Changes the meaning of the dot (.) so it matches every character (instead of every character except \n).
You can also use this inline option to enable single-line mode:
new Regex("(?s)SELECT .+?:SQLEnd:");

SPARQL 1.1: how to use the replace function?

How can one use the replace function in SPARQL 1.1, especially in update commands?
For example, if I have a number of triples ?s ?p ?o where ?o is a string and for all triples where ?o contains the string "gotit" I want to insert an additional triple where "gotit" is replaced by "haveit", how could I do this? I am trying to achieve this is Sesame 2.6.0.
I tried this naive approach:
INSERT { ?s ?p replace(?o,"gotit","haveit","i") . }
WHERE { ?s ?p ?o . FILTER(regex(?o,"gotit","i")) }
but this caused a syntax error.
I also failed to use replace in the result list of a query like so:
SELECT ?s ?p (replace(?o,"gotit","haveit","i") as ?r) WHERE { .... }
The SPARQL document unfortunately does not contain an example of how to use this function.
Is it possible at all to use functions to create new values and not just test existing values and if yes, how?
You can't use an expression directly in your INSERT clause like you have attempted to do. Also you are binding ?name with the first triple pattern but then filtering on ?o in the FILTER which is not going to give you any results (filtering on an unbound variable will give you no results for most filter expressions).
Instead you need to use a BIND in your WHERE clause to make the new version of the value available in the INSERT clause like so:
INSERT
{
?s ?p ?o2 .
}
WHERE
{
?s ?p ?o .
FILTER(REGEX(?o, "gotit", "i"))
BIND(REPLACE(?o, "gotit", "haveit", "i") AS ?o2)
}
BIND assigns the result of an expression to a new variable so you can use that value elsewhere in your query/update.
The relevant part of the SPARQL specification you are interested in is the section on Assignment
The usage of replace looks correct afaict according to the spec. I believe REPLACE was just added to the last rev of the spec relatively recently - perhaps Sesame just doesn't support it yet?
If you just do SELECT ?s ?p ?o WHERE { ?s ?p ?name . FILTER(regex(?name,"gotit","i")) } does your query return rows?