retrieving the class name of a specific subclass in owl - python-2.7

I am an rdflib beginner, i have an ontology with classes and sub-classes and I need to look for a specific word in a subclass and, if it is found, return its class name.
I have the following code:
import rdflib
from rdflib import plugin
from rdflib.graph import Graph
g = Graph()
g.parse("test.owl")
from rdflib.namespace import Namespace
plugin.register(
'sparql', rdflib.query.Processor,
'rdfextras.sparql.processor', 'Processor')
plugin.register(
'sparql', rdflib.query.Result,
'rdfextras.sparql.query', 'SPARQLQueryResult')
qres = g.query("""
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?subject ?object
WHERE { ?subject rdfs:subClassOf ?object }
""")
# n is asubclass name and its class name is good-behaviour which i want to be the result
n="pity"
for (subj,pred,obj) in qres:
if n in subj:
print obj
else:
print "not found"
When I print the result of qres it returns a complete URL, and I need the name only of the sub-class and the class.
Can anyone help with this.

You can use RDFLib without SPARQL and Python string manipulation to get your answer. If you prefer to use SPARQL, the Joshua Taylor answer to this question would be the way to go. You also don't need the SPARQL processor plugin with recent versions (4+) of RDFLib - see the "Querying with SPARQL" documentation.
To get the answer you are looking for you can use the RDFLIB Graph method subject_objects to get a generator of subjects and objects with the predicate you are interested in, rdfs:subClassOf. Each subject and object will be an RDFLib URIRef, which are also Python unicode objects that can be manipulated using standard Python methods. To get the suffix of the IRI call the split method of the object and take the last item in the returned list.
Here is your code reworked to do as described. Without the data, I can't fully test it but this did work for me when using a different ontology.
from rdflib import Graph
from rdflib.namespace import RDFS
g = Graph()
g.parse("test.owl")
# n is a subclass name and its class name is good-behaviour
# which i want to be the result
n = "pity"
for subj, obj in g.subject_objects(predicate=RDFS.subClassOf):
if n in subj:
print obj.rsplit('#')[-1]
else:
print 'not found'

You haven't shown your data, so I can't use your exact query or data, but based on your comments, it sounds like you're getting IRIs (e.g., http://www.semanticweb.org/raya/ontologies/test6#Good-behaviour) as results, and you want just the string Good-behaviour. You can use strafter to do that. For instance, if you had data like this:
#prefix : <http://stackoverflow.com/questions/20830056/> .
#prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
:retrieving-the-class-name-of-a-specific-subclass-in-owl
rdfs:label "retrieving the class name of a specific subclass in owl"#en .
Then a query like this will return results that have full IRIs:
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select ?question where {
?question rdfs:label ?label .
}
---------------------------------------------------------------------------------------------------------
| question |
=========================================================================================================
| <http://stackoverflow.com/questions/20830056/retrieving-the-class-name-of-a-specific-subclass-in-owl> |
---------------------------------------------------------------------------------------------------------
You can use strafter to get the part of a string after some other string. E.g.,
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select ?q where {
?question rdfs:label ?label .
bind(strafter(str(?question),"http://stackoverflow.com/questions/20830056/") as ?q)
}
-------------------------------------------------------------
| q |
=============================================================
| "retrieving-the-class-name-of-a-specific-subclass-in-owl" |
-------------------------------------------------------------
If you define the prefix in the query, e.g., as a so:, then you can also use str(so:) instead of the string form. If you prefer, you can also do the string manipulation in the variable list rather than the graph pattern. That would look like this:
prefix so: <http://stackoverflow.com/questions/20830056/>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select (strafter(str(?question),str(so:)) as ?q) where {
?question rdfs:label ?label .
}
-------------------------------------------------------------
| q |
=============================================================
| "retrieving-the-class-name-of-a-specific-subclass-in-owl" |
-------------------------------------------------------------

Related

Queryset for search by username and display_name In Django Rest Framework

I'm trying to search users by their usernames and display names.
I have used that for the starting match.
search_obj = User.objects.exclude(user_uuid=token_user_uuid).filter(Q(display_name__istartswith=search_name)
| Q(username__istartswith=search_name)
).order_by('-created_on')[:10]
I get the answer same as I want but the problem is if the display name is William Welch and I search for Welch It should return that user also, but it does not return that user.
cases.
username: 123William
display name: William Welch
if search_name 12 then match
if search_name 23w then not match
if search_name Wil then match
if search_name Welc then match
if search_name elch then not match
You can search with __icontains [Django-doc] to look for a substring instead:
from django.db.models import Q
search_obj = (
User.objects.exclude(user_uuid=token_user_uuid)
.filter(
display_name__icontains=search_name,
username__icontains=search_name,
_connector=Q.OR,
)
.order_by('-created_on')[:10]
)
If you want to search for starting words, you can work with a regex with the __iregex lookup [Django-doc]:
import re
from django.db.models import Q
rgx = fr'\y{re.escape(search_name)}'
search_obj = (
User.objects.exclude(user_uuid=token_user_uuid)
.filter(
display_name__iregex=rgx,
username__iregex=rgs,
_connector=Q.OR,
)
.order_by('-created_on')[:10]
)

BigQuery - JSONpath recursive operator (2/2)

Is there any way to realize a recursive search on a JSON string object in BigQuery in absence of the operator "..", which is apparently not supported ?
Motivation: access "name" only when located within "students" in the below.
Query
SELECT JSON_EXTRACT(json_text, '$..students.name') AS first_student
FROM UNNEST([
'{"class" : {"students" : {"name" : "Jane"}}}'
]) AS json_text;
Desired output
+-----------------+
| first_student |
+-----------------+
| "Jane" |
+-----------------+
Current output
Unsupported operator in JSONPath: ..
Is there any way to realize a recursive search on a JSON string object in BigQuery in absence of the operator "..", which is apparently not supported ?
Consider below approach
CREATE TEMPORARY FUNCTION CUSTOM_JSON_EXTRACT(json STRING, json_path STRING)
RETURNS STRING
LANGUAGE js AS """
return jsonPath(JSON.parse(json), json_path);
"""
OPTIONS (
library="gs://some_bucket/jsonpath-0.8.0.js"
);
SELECT CUSTOM_JSON_EXTRACT(json_text, '$..students.name') AS first_student
FROM UNNEST([
'{"class" : {"students" : {"name" : "Jane"}}}'
]) AS json_text;
with output
Note: to overcome current BigQuery's "limitation" for JsonPath, above solution uses UDF + external library - jsonpath-0.8.0.js that can be downloaded from https://code.google.com/archive/p/jsonpath/downloads and uploaded to Google Cloud Storage - gs://some_bucket/jsonpath-0.8.0.js

Spacy to Conll format without using Spacy's sentence splitter

This post shows how to get dependencies of a block of text in Conll format with Spacy's taggers. This is the solution posted:
import spacy
nlp_en = spacy.load('en')
doc = nlp_en(u'Bob bought the pizza to Alice')
for sent in doc.sents:
for i, word in enumerate(sent):
if word.head == word:
head_idx = 0
else:
head_idx = word.head.i - sent[0].i + 1
print("%d\t%s\t%s\t%s\t%s\t%s\t%s"%(
i+1, # There's a word.i attr that's position in *doc*
word,
word.lemma_,
word.tag_, # Fine-grained tag
word.ent_type_,
str(head_idx),
word.dep_ # Relation
))
It outputs this block:
1 Bob bob NNP PERSON 2 nsubj
2 bought buy VBD 0 ROOT
3 the the DT 4 det
4 pizza pizza NN 2 dobj
5 to to IN 2 dative
6 Alice alice NNP PERSON 5 pobj
I would like to get the same output WITHOUT using doc.sents.
Indeed, I have my own sentence-splitter. I would like to use it, and then give Spacy one sentence at a time to get POS, NER, and dependencies.
How can I get POS, NER, and dependencies of one sentence in Conll format with Spacy without having to use Spacy's sentence splitter ?
A Document in sPacy is iterable, and in the documentation is states that it iterates over Tokens
| __iter__(...)
| Iterate over `Token` objects, from which the annotations can be
| easily accessed. This is the main way of accessing `Token` objects,
| which are the main way annotations are accessed from Python. If faster-
| than-Python speeds are required, you can instead access the annotations
| as a numpy array, or access the underlying C data directly from Cython.
|
| EXAMPLE:
| >>> for token in doc
Therefore I believe you would just have to make a Document for each of your sentences that are split, then do something like the following:
def printConll(split_sentence_text):
doc = nlp(split_sentence_text)
for i, word in enumerate(doc):
if word.head == word:
head_idx = 0
else:
head_idx = word.head.i - sent[0].i + 1
print("%d\t%s\t%s\t%s\t%s\t%s\t%s"%(
i+1, # There's a word.i attr that's position in *doc*
word,
word.lemma_,
word.tag_, # Fine-grained tag
word.ent_type_,
str(head_idx),
word.dep_ # Relation
))
Of course, following the CoNLL format you would have to print a newline after each sentence.
This post is about a user facing unexpected sentence breaks from using the spacy sentence boundary detection. One of the solutions proposed by the developers at Spacy (as on the post) is to add flexibility to add ones own sentence boundary detection rules. This problem is solved in conjunction with dependency parsing by Spacy, not before it. Therefore, I don't think what you're looking for is supported at all by Spacy at the moment, though it might be in the near future.
#ashu 's answer is partly right: dependency parsing and sentence boundary detection are tightly coupled by design in spaCy. Though there is a simple sentencizer.
https://spacy.io/api/sentencizer
It seems the sentecizer just uses punctuation (not the perfect way). But if such sentencizer exists then you can create a custom one using your rules and it will affect sentence boundaries for sure.

Python code with SPARQL not working

I'm writing a python code to match the list of actors between DBPEDIA and WIKIDATA. First i'm retrieving the list of actors with some additional information such as birth date, birth place from Dbpedia using SPARQL and using the same list of actors which are retrieved from Dbpedia, i'm trying to retrieve some additional information such as awards received. My python code is throwing an error.
I have a hunch that the dbpedia portion of the query is timing out within wikidata. Skipping the federated binding and adding a limit, the query goes to completion, but takes several seconds. Un-comment the triple about the awards, and it times out.
Since there are problems with the SPARQL, I'm going to ignore the Python processing for now.
Independent of that, I found two glitches:
# missing prefixes
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbp: <http://dbpedia.org/property/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT *
WHERE {
SERVICE <http://dbpedia.org/sparql> {
?c rdf:type <http://umbel.org/umbel/rc/Actor> ;
rdfs:label ?Actor
FILTER ( lang(?Actor) = "en" )
?c dbo:deathDate ?Death_date ;
dbo:birthPlace ?b
# date filterning not working... add cast
FILTER ( xsd:date(?Death_date) >= "1990 - 01 - 01"^^xsd:date )
?b rdfs:label ?birth_Place
FILTER ( lang(?birth_Place) = "en" )
?Starring rdf:type dbo:Film ;
dbo:starring ?c .
?c dbo:deathCause ?d .
?d dbp:name ?Cause_Of_Death .
?c owl:sameAs ?wikidata_actor
FILTER strstarts(str(?wikidata_actor), "http://www.wikidata.org")
}
# ?wikidata_actor wdt:P166 ?award_received.
}
LIMIT 9
Every SPARQL endpoint has its own unique personality. So in my opinion, federated queries (which use the service keyword and hit two or more endpoints) can be especially tricky. In case you're new to federation, here's an unrelated query that works.
There is some entity that tweets under the name 'darwilliamstour'. What is the name of that entity?
select *
where
{
?twitterer wdt:P2002 'darwilliamstour' .
service <http://dbpedia.org/sparql>
{
?twitterer rdfs:label ?name
}
}

dc:Creator string literal vs. regex FILTER in SPARQL

I am using Europeana's Virtuoso SPARQL Endpoint.
I have been trying to search in SPARQL for content about a specific contributor. To my understanding, this could be carried out this way:
PREFIX dc: <http://purl.org/dc/elements/1.1/>
SELECT ?title
WHERE {
?objectInfo dc:title ?title .
?objectInfo dc:creator 'Picasso' .
}
Nevertheless, I get nothing in return.
Alternatively, I used FILTER regex to search for the literal.
PREFIX dc: <http://purl.org/dc/elements/1.1/>
SELECT ?title ?creator
WHERE {
?objectInfo dc:title ?title .
?objectInfo dc:creator ?creator .
FILTER regex(?creator, 'Picasso')
}
This actually worked very well and returned correctly the results.
My question is: Is it possible to produce the SPARQL query without using FILTER to search the work of a particular artist?
Many thanks.
I don't think there are any objects with 'Picasso' literally as the creator. So a regex filter is a good choice, but slow.
Here's a way to find the strings your regex is matching:
PREFIX dc: <http://purl.org/dc/elements/1.1/>
SELECT ?creator, (count(?creator) as ?ccount)
WHERE {
?objectInfo dc:title ?title .
?objectInfo dc:creator ?creator .
FILTER regex(?creator, 'Picasso')
}
group by ?creator
order by ?ccount
It might have been easier for you to see that if your had displayed all variables in the select statement:
PREFIX dc: <http://purl.org/dc/elements/1.1/>
SELECT *
WHERE {
?objectInfo dc:title ?title .
?objectInfo dc:creator ?creator .
FILTER regex(?creator, 'Picasso')
}
If you don't want to use a regex filter, you could enumerate all of the Picasso variants you are looking for:
PREFIX dc: <http://purl.org/dc/elements/1.1/>
SELECT *
WHERE {
values ?creator { "Picasso, Pablo" "Pablo Picasso" } .
?objectInfo dc:title ?title .
?objectInfo dc:creator ?creator
}
bif:contains works on this endpoint and is pretty fast:
PREFIX dc: <http://purl.org/dc/elements/1.1/>
SELECT *
WHERE {
?objectInfo dc:title ?title .
?objectInfo dc:creator ?creator .
?creator bif:contains 'Picasso'
#FILTER regex(?creator, 'Picasso')
}
1) Your first query has unconnected triple patterns.
2) I guess and according to the vocabulary description, dc:creator expects a resource, i.e. a URI. Using the URI of the entity Picasso doesn't work?
+--------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
| Term Name: creator | |
| URI: | http://purl.org/dc/elements/1.1/creator |
| Label: | Creator |
| Definition: | An entity primarily responsible for making the resource. |
| Comment: | Examples of a Creator include a person, an organization, or a service. Typically, the name of a Creator should be used to indicate the entity. |
+--------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
It would good to see your data in order to decide whether FILTER on literals is necessary or not.