Matching double quotes in SPARQL query in Virtuoso - regex

I need to get a SPARQL query that matches double quotes in Virtuoso graph. I use such query:
SELECT distinct ?o
FROM <http://graph>
WHERE
{
?s ?p ?o.
}
It returns me a column with such values:
http://some.prefix/Symbol
"abcd"
I need to match only second value ("abcd"). I tried to add such filter to WHERE clause:
FILTER regex(str(?o), "\"")
But it returns no results. I also tried '"' as a second parameter to regex, and some other things. Is it possible at all?

"abcd" is a literal of four characters. It does not include the ""; these are the string delimiters and do not form part of the string.
FILTER isLiteral(?o)
should work.

Since your filter is not working, "abcd" does not have a double quote in it. It's a string literal. Not sure what type it is; so you can use --
select ?type where { "abcd" a ?type }
-- to get its type. You can then use that type as a filter in your query as:
SELECT distinct ?o
FROM <http://graph>
WHERE
{
?s ?p ?o .
?o a <whatever type you received in the previous query> .
}

Related

How to define regexp for text in Postgres

Please help to define Postgres regexp for this case:
I have string field:
union all select 'AbC-345776-2345' /*comment*/ union all select 'Fgr-sdf344-111a' /*BN34*/ some text union all select 'sss-sdf34-123' /*some text*/ some text
Here is the same text in select statement for convinience:
select 'union all select ''AbC-345776-2345'' /*comment*/ union all select ''Fgr-sdf344-111a'' /*BN34*/ some text union all select ''sss-sdf34-123'' /*some text*/ some text' as str
I need to get from this mess text only values in '...' and select it into separated rows like this:
AbC-345776-2345
Fgr-sdf344-111a
sss-sdf34-123
Pattern: 'first 2-3 letters - several letters and numbers - several letters and numbers'
I created this select but it contains all comments and "sometext" as well:
select regexp_split_to_table(trim(replace(replace(replace(replace(t1.str,'union all select',''),'from DUAL',''),chr(10),''),'''','') ), E'\\s+')
from (select 'union all select ''AbC-345776-2345'' /*comment*/ union all select ''Fgr-sdf344-111a'' /*BN34*/ some text union all select ''sss-sdf34-123'' /*some text*/ some text' as str) t1;
The following should do it:
select (regexp_matches(str, $$'([a-zA-Z]{2,3}-[a-zA-Z0-9]+-[a-zA-Z0-9]+)'$$, 'g'))[1]
from the_table;
Given your sample data it returns:
regexp_matches
---------------
AbC-345776-2345
Fgr-sdf344-111a
sss-sdf34-123
The regex checks for the pattern you specified inside single quotes. By using a group (...) I excluded the single quotes from the result.
regexp_matches() returns one row for each match, containing an array of matches. But as the regex only contains a single group, the first element of the array is what we are interested in.
I used dollar quoting to avoid escaping the single quotes in the regex
Online example

CASE WHEN - LIKE - REGEXP in Hadoop Hive

I want to write a query in a hive Table using CASE WHEN, LIKE and a regular expression. I have used regexp and rlike, but I do not get the desired results. My attempts so far are the following
select distinct ending from
(select date, ending, name, count(distinct id)
from (select CONCAT_WS("/",year,month,day,hour) as date, id, name,
case when type = 'TRAN' then 'tran'
when events regexp '%[:]no_reply[:]%[^o][^n][:]incomplete[:]%' and type rlike '%HUP' then 'con'
when events not regexp '%[:]no_reply[:]%[^o][^n][:]incomplete[:]%' and type rlike '%HUP' then 'aban'
else 'other'
end as ending
from data_struct1) tmp
group by date, ending, name) tmp2;
and also
select distinct ending from
(select date, ending, name, count(distinct id)
from (select CONCAT_WS("/",year,month,day,hour) as date, id, name,
case when type = 'TRAN' then 'tran'
when events rlike '%[:]no_reply[:]%[^o][^n][:]incomplete[:]%' and type rlike '%HUP' then 'con'
when events not rlike '%[:]no_reply[:]%[^o][^n][:]incomplete[:]%' and type rlike '%HUP' then 'aban'
else 'other'
end as ending
from data_struct1) tmp
group by date, ending, name) tmp2;
Both queries return incorrect results (not bad syntax, just not the correct results).
There are a lot of docs on regex quantifiers, for example this one: https://learn.microsoft.com/en-us/dotnet/standard/base-types/quantifiers-in-regular-expressions
select 'opencase_2,initial_state:inquiry,inquiry:no_reply:initial_state:incomplete::,inquiry:reask:secondary_state:complete::' regexp 'no_reply:[^:]+:incomplete';
OK
true
Also this is wrong: rlike '%HUP'. It should be like this '.*HUP$' (in the end of the string) or simply 'HUP' if it does not matter where the HUP is located: in the middle or in the end or in the beginning of the string
rlike and regexp in your query work the same, better use the same operator: regexp or rlike only. These two are synonyms.
Test: https://regex101.com/r/ksG67v/1

Selecting for a Jsonb array contains regex match

Given a data structure as follows:
{"single":"someText", "many":["text1", text2"]}
I can query a regex on single with
WHERE JsonBColumn ->> 'single' ~ '^some.*'
And I can query a contains match on the Array with
WHERE JsonBColumn -> 'many' ? 'text2'
What I would like to do is to do a contains match with a regex on the JArray
WHERE JsonBColumn -> 'many' {Something} '.*2$'
I found that it is also possible to convert the entire JSONB array to a plain text string and simply perform the regular expression on that. A side effect though is that a search on something like
xt 1", "text
would end up matching.
This approach isn't as clean since it doesn't search each element individually but it gets the job done with a visually simpler statement.
WHERE JsonBColumn ->>'many' ~ 'text2'
Use jsonb_array_elements_text() in lateral join.
with the_data(id, jsonbcolumn) as (
values
(1, '{"single":"someText", "many": ["text1", "text2"]}'::jsonb)
)
select distinct on (id) d.*
from
the_data d,
jsonb_array_elements_text(jsonbcolumn->'many') many(elem)
where elem ~ '^text.*';
id | jsonbcolumn
----+----------------------------------------------------
1 | {"many": ["text1", "text2"], "single": "someText"}
(1 row)
See also this answer.
If the feature is used frequently, you may want to write your own function:
create or replace function jsonb_array_regex_like(json_array jsonb, pattern text)
returns boolean language sql as $$
select bool_or(elem ~ pattern)
from jsonb_array_elements_text(json_array) arr(elem)
$$;
The function definitely simplifies the code:
with the_data(id, jsonbcolumn) as (
values
(1, '{"single":"someText", "many": ["text1", "text2"]}'::jsonb)
)
select *
from the_data
where jsonb_array_regex_like(jsonbcolumn->'many', '^text.*');

Extract triples containing particular substring using SPARQL

I want to extract a triple which contains word say "alice" in its subject. The query I used was:
SELECT ?s ?p ?o WHERE { ?s ?p ?o .FILTER regex(?s, \"alice\") .}
This doesn't give me any results inspite of have a triple which satisfies this constraint.
On the other hand when I use the same query to extract a triple which contains a word brillant in its object .It returns only one of the 2 possible matches.
The query used is:
SELECT ?s ?p ?o WHERE { ?s ?p ?o .FILTER regex(?o, \"brillant\") .}
Please let me know where am I going wrong and what is the reason for this behaviour.
I'll assume that the escapes around the quotation marks are just a remnant from copying and pasting. The first argument to regex must be a literal, but literals cannot be the subjects of triples in RDF, so it's not true that you have data that should match this pattern. What you might have, though, is subjects whose URI contains the string "alice", and you can get the string representation of the URI using the str function. E.g.,
SELECT ?s ?p ?o WHERE { ?s ?p ?o .FILTER regex(str(?s), "alice") .}
To illustrate, let's use the two values <http://example.org> and "string containing example" and filter as you did in your original query:
select ?x where {
values ?x { <http://example.org> "string containing example" }
filter( regex(?x, "exam" ))
}
-------------------------------
| x |
===============================
| "string containing example" |
-------------------------------
We only got "string containing example" because the other value wasn't a string, and so wasn't a suitable argument to regex. However, if we add the call to str, then it's the string representation of the URI that regex will consider:
select ?x where {
values ?x { <http://example.org> "string containing example" }
filter( regex(str(?x), "exam" ))
}
-------------------------------
| x |
===============================
| <http://example.org> |
| "string containing example" |
-------------------------------

SPARQL 1.1: how to use the replace function?

How can one use the replace function in SPARQL 1.1, especially in update commands?
For example, if I have a number of triples ?s ?p ?o where ?o is a string and for all triples where ?o contains the string "gotit" I want to insert an additional triple where "gotit" is replaced by "haveit", how could I do this? I am trying to achieve this is Sesame 2.6.0.
I tried this naive approach:
INSERT { ?s ?p replace(?o,"gotit","haveit","i") . }
WHERE { ?s ?p ?o . FILTER(regex(?o,"gotit","i")) }
but this caused a syntax error.
I also failed to use replace in the result list of a query like so:
SELECT ?s ?p (replace(?o,"gotit","haveit","i") as ?r) WHERE { .... }
The SPARQL document unfortunately does not contain an example of how to use this function.
Is it possible at all to use functions to create new values and not just test existing values and if yes, how?
You can't use an expression directly in your INSERT clause like you have attempted to do. Also you are binding ?name with the first triple pattern but then filtering on ?o in the FILTER which is not going to give you any results (filtering on an unbound variable will give you no results for most filter expressions).
Instead you need to use a BIND in your WHERE clause to make the new version of the value available in the INSERT clause like so:
INSERT
{
?s ?p ?o2 .
}
WHERE
{
?s ?p ?o .
FILTER(REGEX(?o, "gotit", "i"))
BIND(REPLACE(?o, "gotit", "haveit", "i") AS ?o2)
}
BIND assigns the result of an expression to a new variable so you can use that value elsewhere in your query/update.
The relevant part of the SPARQL specification you are interested in is the section on Assignment
The usage of replace looks correct afaict according to the spec. I believe REPLACE was just added to the last rev of the spec relatively recently - perhaps Sesame just doesn't support it yet?
If you just do SELECT ?s ?p ?o WHERE { ?s ?p ?name . FILTER(regex(?name,"gotit","i")) } does your query return rows?