I'm writing a python code to match the list of actors between DBPEDIA and WIKIDATA. First i'm retrieving the list of actors with some additional information such as birth date, birth place from Dbpedia using SPARQL and using the same list of actors which are retrieved from Dbpedia, i'm trying to retrieve some additional information such as awards received. My python code is throwing an error.
I have a hunch that the dbpedia portion of the query is timing out within wikidata. Skipping the federated binding and adding a limit, the query goes to completion, but takes several seconds. Un-comment the triple about the awards, and it times out.
Since there are problems with the SPARQL, I'm going to ignore the Python processing for now.
Independent of that, I found two glitches:
# missing prefixes
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbp: <http://dbpedia.org/property/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT *
WHERE {
SERVICE <http://dbpedia.org/sparql> {
?c rdf:type <http://umbel.org/umbel/rc/Actor> ;
rdfs:label ?Actor
FILTER ( lang(?Actor) = "en" )
?c dbo:deathDate ?Death_date ;
dbo:birthPlace ?b
# date filterning not working... add cast
FILTER ( xsd:date(?Death_date) >= "1990 - 01 - 01"^^xsd:date )
?b rdfs:label ?birth_Place
FILTER ( lang(?birth_Place) = "en" )
?Starring rdf:type dbo:Film ;
dbo:starring ?c .
?c dbo:deathCause ?d .
?d dbp:name ?Cause_Of_Death .
?c owl:sameAs ?wikidata_actor
FILTER strstarts(str(?wikidata_actor), "http://www.wikidata.org")
}
# ?wikidata_actor wdt:P166 ?award_received.
}
LIMIT 9
Every SPARQL endpoint has its own unique personality. So in my opinion, federated queries (which use the service keyword and hit two or more endpoints) can be especially tricky. In case you're new to federation, here's an unrelated query that works.
There is some entity that tweets under the name 'darwilliamstour'. What is the name of that entity?
select *
where
{
?twitterer wdt:P2002 'darwilliamstour' .
service <http://dbpedia.org/sparql>
{
?twitterer rdfs:label ?name
}
}
Related
I have this insert query:
INSERT
{
<http://www.google.com/go/guest> <http://www.google.com/go/hasRelatives> ?state}
WHERE {?state a <http://www.google.com/go#State>.
filter ($state=<http://www.google.com/go/State-USA>)
}
If the state is equal to <http://www.google.com/go#State-USA> I would need to insert all the states of type <http://www.google.com/go#State>. -Exactly what the SPARQL insert query is doing at the moment.
If not, I would need to insert only the specified state, for example: <http://www.google.com/go#State-Alabama>
Like with the below query:
INSERT { <http://www.google.com/go/guest> <http://www.google.com/go/hasRelatives> $state }
WHERE {?state a <http://www.google.com/go#State>.
filter ($state!=<http://www.google.com/go/State-USA>)
}
How could I write an if-else statement inside the insert, to check what the value of ?state is, and then to run the needed insert query.
How could I combine the two queries into only one, with the proper conditions?
The triples:
#prefix : <http://www.google.com/go#> .
#prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
#prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
#prefix xml: <http://www.w3.org/XML/1998/namespace> .
#prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
<http://www.google.com/go/State-USA> a :State ;
rdfs:label "USA" .
<http://www.google.com/go/State-Michigan> a :State ;
rdfs:label "Michigan" .
<http://www.google.com/go/State-NewYork> a :State ;
rdfs:label "New York" .
<http://www.google.com/go/State-Alabama> a :State ;
rdfs:label "Alabama" .
I want to get the output of the first query (SophosBearerToken) to the other one, but it gives me the following error
Formula.Firewall: Query 'Query2' (step 'PartnerID') references other
queries or steps, so it may not directly access a data source. Please
rebuild this data combination.
first query : SophosBearerToken (works)
let
SophosBearerToken = "Bearer " & (Json.Document(Web.Contents("https://id.sophos.com/api/v2/oauth2/token",
[
Headers = [#"Content-Type"="application/x-www-form-urlencoded"],
Content = Text.ToBinary("grant_type=client_credentials&client_id=" & #"SophosClientID" & "&client_secret=" & #"SophosClientSecret" & "&scope=token")
]
)) [access_token])
in
SophosBearerToken
second query: Query2(fail)
let
PartnerIDQuery = Json.Document(Web.Contents("https://api.central.sophos.com/whoami/v1", [Headers = [#"Authorization"=#"SophosBearerToken"]])),
PartnerID = PartnerIDQuery[id]
in
PartnerID
but when I add the output of the first query manually to the second one it works
what could be the mistake I'm doing in here?
I ran a cypher query to delete all duplicate relationship with same name from my graph. A relationship has properties(name, confidence, time). I kept the relationship with highest confidence value and collected all time values, using following query:
MATCH (e0:Entity)-[r:REL]-(e1:Entity)
WITH e0, r.name AS relation, COLLECT(r) AS rels, COLLECT(r.confidence)AS relConf, MAX(r.confidence) AS maxConfidence, COLLECT(r.time) AS relTime, e1 WHERE SIZE(rels) > 1
SET (rels[0]).confidence = maxConfidence, (rels[0]).time = relTime
FOREACH (rel in tail(rels) | DELETE rel)
RETURN rels, relation, relConf, maxConfidence, relTime
Old Data:
name,confidence,time
likes, 0.87, 20111201010900
likes, 0.97, 20111201010600
New data:
name,confidence,time
likes, 0.97, [20111201010900,20111201010600]
Could anyone please suggest a match query to find relationships containing year 2011 in new "time" property?? (I converted time using toInt while loading from a csv).
Your new data structure is definitely not easy to make such searches, but it is possible on medium graphs :
MATCH (n:Entity)-[r:REL]->(x)
WHERE ANY(
t IN extract(x IN r.time | toString(x))
WHERE t STARTS WITH "2011"
)
RETURN r
I have a lot of undocumented and uncommented SQL queries. I would like to extract some information within the SQL-statements. Particularly, I'm interested in DB-names, table names and if possible column names. The queries have usually the following syntax.
SELECT *
FROM mydb.table1 m
LEFT JOIN mydb.sometable o ON m.id = o.id
LEFT JOIN mydb.sometable t ON p.id=t.id
LEFT JOIN otherdb.sometable s ON s.column='test'
Usually, the statements involes several DBs and Tables. I would like only extract DBs and Tables with any other information. I thought if whether it is possible to extract first the information which begins after FROM & JOIN & LEFT JOIN. Here its usually db.table letters such as o t s correspond already to referenced tables. I suppose they are difficult to capture. What I tried without any success is to use something like:
gsub(".*FROM \\s*|WHERE|ORDER|GROUP.*", "", vec)
Assuming that each statement ends with WHERE/where or ORDER/order or GROUP... But that doesnt work out as expected.
You haven't indicated which database system you are using but virtually all such systems have introspection facilities that would allow you to get this information a lot more easily and reliably than attempting to parse SQL statements. The following code which supposes SQLite can likely be adapted to your situation by getting a list of your databases and then looping over the databases and using dbConnect to connect to each one in turn running code such as this:
library(gsubfn)
library(RSQLite)
con <- dbConnect(SQLite()) # use in memory database for testing
# create two tables for purposes of this test
dbWriteTable(con, "BOD", BOD, row.names = FALSE)
dbWriteTable(con, "iris", iris, row.names = FALSE)
# get all table names and columns
tabinfo <- Map(function(tab) names(fn$dbGetQuery(con, "select * from $tab limit 0")),
dbListTables(con))
dbDisconnect(con)
giving an R list whose names are the table names and whose entries are the column names:
> tabinfo
$BOD
[1] "Time" "demand"
$iris
[1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"
or perhaps long form output is preferred:
setNames(stack(tabinfo), c("column", "table"))
giving:
column table
1 Time BOD
2 demand BOD
3 Sepal.Length iris
4 Sepal.Width iris
5 Petal.Length iris
6 Petal.Width iris
7 Species iris
You could use the stringi package for this.
library(stringi)
# Your string vector
myString <- "SELECT *
FROM mydb.table1 m
LEFT JOIN mydb.sometable o ON m.id = o.id
LEFT JOIN mydb.sometable t ON p.id=t.id
LEFT JOIN otherdb.sometable s ON s.column='test'"
# Three stringi functions used
# stringi_extract_all_regex will extract the strings which have FROM or JOIN followed by some text till the next space
# string_replace_all_regex will replace all the FROM or JOIN followed by space with null string
# stringi_unique will extract all unique strings
t <- stri_unique(stri_replace_all_regex(stri_extract_all_regex(myString, "((FROM|JOIN) [^\\s]+)", simplify = TRUE),
"(FROM|JOIN) ", ""))
> t
[1] "mydb.table1" "mydb.sometable" "otherdb.sometable"
I am an rdflib beginner, i have an ontology with classes and sub-classes and I need to look for a specific word in a subclass and, if it is found, return its class name.
I have the following code:
import rdflib
from rdflib import plugin
from rdflib.graph import Graph
g = Graph()
g.parse("test.owl")
from rdflib.namespace import Namespace
plugin.register(
'sparql', rdflib.query.Processor,
'rdfextras.sparql.processor', 'Processor')
plugin.register(
'sparql', rdflib.query.Result,
'rdfextras.sparql.query', 'SPARQLQueryResult')
qres = g.query("""
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?subject ?object
WHERE { ?subject rdfs:subClassOf ?object }
""")
# n is asubclass name and its class name is good-behaviour which i want to be the result
n="pity"
for (subj,pred,obj) in qres:
if n in subj:
print obj
else:
print "not found"
When I print the result of qres it returns a complete URL, and I need the name only of the sub-class and the class.
Can anyone help with this.
You can use RDFLib without SPARQL and Python string manipulation to get your answer. If you prefer to use SPARQL, the Joshua Taylor answer to this question would be the way to go. You also don't need the SPARQL processor plugin with recent versions (4+) of RDFLib - see the "Querying with SPARQL" documentation.
To get the answer you are looking for you can use the RDFLIB Graph method subject_objects to get a generator of subjects and objects with the predicate you are interested in, rdfs:subClassOf. Each subject and object will be an RDFLib URIRef, which are also Python unicode objects that can be manipulated using standard Python methods. To get the suffix of the IRI call the split method of the object and take the last item in the returned list.
Here is your code reworked to do as described. Without the data, I can't fully test it but this did work for me when using a different ontology.
from rdflib import Graph
from rdflib.namespace import RDFS
g = Graph()
g.parse("test.owl")
# n is a subclass name and its class name is good-behaviour
# which i want to be the result
n = "pity"
for subj, obj in g.subject_objects(predicate=RDFS.subClassOf):
if n in subj:
print obj.rsplit('#')[-1]
else:
print 'not found'
You haven't shown your data, so I can't use your exact query or data, but based on your comments, it sounds like you're getting IRIs (e.g., http://www.semanticweb.org/raya/ontologies/test6#Good-behaviour) as results, and you want just the string Good-behaviour. You can use strafter to do that. For instance, if you had data like this:
#prefix : <http://stackoverflow.com/questions/20830056/> .
#prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
:retrieving-the-class-name-of-a-specific-subclass-in-owl
rdfs:label "retrieving the class name of a specific subclass in owl"#en .
Then a query like this will return results that have full IRIs:
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select ?question where {
?question rdfs:label ?label .
}
---------------------------------------------------------------------------------------------------------
| question |
=========================================================================================================
| <http://stackoverflow.com/questions/20830056/retrieving-the-class-name-of-a-specific-subclass-in-owl> |
---------------------------------------------------------------------------------------------------------
You can use strafter to get the part of a string after some other string. E.g.,
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select ?q where {
?question rdfs:label ?label .
bind(strafter(str(?question),"http://stackoverflow.com/questions/20830056/") as ?q)
}
-------------------------------------------------------------
| q |
=============================================================
| "retrieving-the-class-name-of-a-specific-subclass-in-owl" |
-------------------------------------------------------------
If you define the prefix in the query, e.g., as a so:, then you can also use str(so:) instead of the string form. If you prefer, you can also do the string manipulation in the variable list rather than the graph pattern. That would look like this:
prefix so: <http://stackoverflow.com/questions/20830056/>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select (strafter(str(?question),str(so:)) as ?q) where {
?question rdfs:label ?label .
}
-------------------------------------------------------------
| q |
=============================================================
| "retrieving-the-class-name-of-a-specific-subclass-in-owl" |
-------------------------------------------------------------