SPARQL query to get subjects linked to object 2 that are not linked to object 1, using IF or NOT EXISTS or any other way? - if-statement

I hope you are doing well.
Here is the basic structure of my graph database. Components have estimation methods, estimation methods have parameters and parameters have data sources.
c -> em -> p -> ds
Where,
c stands for components
em stands for estimation methods
p stands for parameters
ds stands for data sources
I am able to query individuals in the structured format like this:
SELECT ?c ?em ?p ?ds WHERE {
?c wb:hasEstimationMethod ?em.
OPTIONAL {
?em wb:hasParameter ?p.
OPTIONAL{
?p wb:hasDataSource ?ds.
}
}
}
I use OPTIONAL clause because there is a possibility that estimation method might not have any parameters and similarly parameters might not have any data sources.
However, there are few cases where, for example, an estimation method is unknown but we know the parameter. So for example in that case, components will directly have parameters and I would prefer to have blank for estimation methods. So here is the output I would like to have,
c
em
p
ds
component-1
estimation method-1
parameter-1
data source-1
component-2
parameter-2
data source-2
component-3
parameter-3
If you notice the last two rows have have missing info which is what I want to have in my output if that is the case. In other words, I want to skip a step in the hierarchical structure.
So my question is, how can I first query ?c wb:hasEstimationMethod ?em but if it does not have any value, I want to tell SPARQL to use query ?c wb:hasParameter ?p and similarly if that has no value as well, do ?c wb:hasDataSource ?ds ?
Any help will be greatly appreciated! Please let me know if I am not using the right terminology. Have a wonderful day :)

Related

SPLUNK subsearch 2 CSV Files join together

I have 2 Files with order data saved in two different sourcetypes in splunk.
One file contains an orderid, plnum(praefix + orderid (one ordernumer contains 3 plnum)), model (type of the order). The second file contains the same plnum's and Materialnumbers to those plnum's.
I want to search for the top Materials used for one or more Models.
So I searched for how to setup a subsearch:
sourcetype=file1 [search sourcetype=file2 MODEL="someting"| fields MODEL] |stats values(MATNR) by MODEL
I dont know why the subsearch dont work.
Run the subsearch by itself to verify it works and produces the expected results. I suspect it is working and is returning a list of PLNUMs in the form foo bar baz.... Splunk puts an implicit AND between search terms so your main search is looking for events containing all PLNUMs, which is unlikely.
Try using format in your subsearch. It returns the results in foo OR bar OR baz... format, which should work better in the main search.
sourcetype=file1 [search sourcetype=file2 MODEL="someting"| fields PLNUM | format] |stats values(MATNR) by PLNUM

Get Taxonomy Term ID by Node in Drupal 8

I'm trying to get Taxonomy data by particular node.
How can I get Taxonomy Term Id by using Node object ?
Drupal ver. 8.3.6
You could do something like that:
$termId = $node->get('field_yourfield')->target_id;
Then you can load the term with
Term::load($termId);
Hope this helps.
If you want to get Taxonomy Term data you can use this code:
$node->get('field_yourfield')->referencedEntities();
Hope it will be useful for you.
PS: If you need just Term's id you can use this:
$node->get('field_yourfield')->getValue();
You will get something like this:
[0 => ['target_id' => 23], 1 => ['target_id'] => 25]
In example my field has 2 referenced taxonomy terms.
Thanks!
#Kevin Wenger's comment helped me. I'm totally basing this answer on his comment.
In your code, when you have access to a fully loaded \Drupal\node\Entity\Node you can access all the (deeply) nested properties.
In this example, I've got a node which has a taxonomy term field "field_site". The "field_site" term itself has a plain text field "field_site_url_base". In order to get the value of the "field_site_url_base", I can use the following:
$site_base_url = $node->get('field_site')->entity->field_site_url_base->value;
How to extract multiple term IDs easily if you know a little Laravel (specifically Collections):
Setup: composer require tightenco/collect to make Collections available in Drupal.
// see #Wau's answer for this first bit...
// remember: if you want the whole Term object, use ->referencedEntities()
$field_value = $node->get('field_yourfield')->getValue();
// then use collections to avoid loops etc.
$targets = collect($field_value)->pluck('target_id')->toArray();
// $targets = [1,2,3...]
or maybe you'd like the term IDs comma-separated? (I used this for programmatically passing contextual filter arguments to a view, which requires , (OR) or + (AND) to specify multiple values.)
$targets = collect($field_value)->implode('target_id', ',');
// $targets = "1,2,3"

SPARQL: combine and exclude regex filters

I want to filter my SPARQL query for specific keywords while at the same time excluding other keywords. I thought this may be easily accomplished with FILTER (regex(str(?var),"includedKeyword","i") && !regex(str(?var),"excludedKeyword","i")). It works without the "!" condition, but not with. I also separated the FILTER statements, but no use.
I used this query on http://europeana.ontotext.com/ :
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX edm: <http://www.europeana.eu/schemas/edm/>
PREFIX ore: <http://www.openarchives.org/ore/terms/>
SELECT DISTINCT ?CHO
WHERE {
?proxy dc:subject ?subject .
FILTER ( regex(str(?subject),"gemälde","i") && !regex(str(?subject),"Fotografie","i") )
?proxy edm:type "IMAGE" .
?proxy ore:proxyFor ?CHO.
?agg edm:aggregatedCHO ?CHO; edm:country "germany".
}
But I always get the result on the first row with the title "Gemäldegalerie", which has a dc:subject of "Fotografie" (the one I want excluded). I think the problem lies in the fact that one object from the Europeana database can have more than one dc:subject property, so maybe it looks only for one of these properties while ignoring the other ones.
Any ideas? Would be very thankful!
The problem is that your combined filter checks for the same binding of ?subject. So it succeeds if at least one value of ?subject matches both conditions (which is almost always true, because the string "Gemäldegalerie", for example, matches your first regex and does not match the second).
So for the negative condition, you need to formulate something that checks for all possible values, rather than just one particular value. You can do this using SPARQL's NOT EXISTS function, for example like this:
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX edm: <http://www.europeana.eu/schemas/edm/>
PREFIX ore: <http://www.openarchives.org/ore/terms/>
SELECT DISTINCT ?CHO
WHERE {
?proxy edm:type "IMAGE" .
?proxy ore:proxyFor ?CHO.
?agg edm:aggregatedCHO ?CHO; edm:country "germany".
?proxy dc:subject ?subject .
FILTER(regex(str(?subject),"gemälde","i"))
FILTER NOT EXISTS {
?proxy dc:subject ?otherSubject.
FILTER(regex(str(?otherSubject),"Fotografie","i"))
}
}
As an aside: since you are doing regular expression checks, and now combining them with an NOT EXISTS operator, this is likely to become very expensive for the query processor quite quickly. You may want to think about alternative ways to formulate your query (for example, using the exact subject string to include or exclude to eliminate the regex), or even having a look at some non-standard extensions that the SPARQL endpoint might provide (OWLIM, for example, the store on which the Europeana endpoint runs, supports various full-text-search extensions, though I am not sure they are enabled in the Europeana endpoint).

sparql regex compare two string variables (one is composed by the other)

I am trying to compare two string variables to discover if one is contained in the other, specifically if one is composed of the other (so, I would like to avoid retrieving that "information" contains "format". I am interested only in results similar to "information_management" includes "information".
I have tried both FILTER CONTAINS() and FILTER regex() with the same results. How can I modify the query so it includes the fact that there needs to be a space either before or after the term?
SELECT DISTINCT ?l1 ?l2
WHERE
{
?term1 skos:prefLabel ?l1.
?term2 skos:prefLabel ?l2.
FILTER(contains(?l1,?l2))
}
So if I understand you correctly you want to find pairs of terms where one term is contained in the other but is not equal to the other?
If so you can add a !SAMETERM() call into the the FILTER clause like so:
SELECT DISTINCT ?l1 ?l2
WHERE
{
?term1 skos:prefLabel ?l1.
?term2 skos:prefLabel ?l2.
FILTER(!SAMETERM(?l1, ?l2) && contains(?l1,?l2))
}
Edit
Re-reading the question I don't think I addressed the whole question, for the problem where you have the terms "format" and "information" and don't want them to be matched you can do something like the following:
SELECT DISTINCT ?l1 ?l2
WHERE
{
?term1 skos:prefLabel ?l1.
?term2 skos:prefLabel ?l2.
FILTER(!SAMETERM(?l1, ?l2)
&& contains(?l1,?l2)
&& ( STRENDS(STRBEFORE(?l1, ?l2)," ")
|| STRSTARTS(STRAFTER(?l1, ?l2), " ")
))
}
This requires that the string before/after the containing term must end/start with whitespace. You may have to play around with this to get something that more closely models your constraints.
Another solution would be by constructing a regex pattern on the fly, like:
FILTER(regex(concat("\\b", ?l1, "\\b"), ?l2))
I'm not entirely sure that SPARQL/XML Schema requires \b, but I think most implementations will have it.

Is there a way to filter a django queryset based on string similarity (a la python difflib)?

I have a need to match cold leads against a database of our clients.
The leads come from a third party provider in bulk (thousands of records) and sales is asking us to (in their words) "filter out our clients" so they don't try to sell our service to a established client.
Obviously, there are misspellings in the leads. Charles becomes Charlie, Joseph becomes Joe, etc. So I can't really just do a filter comparing lead_first_name to client_first_name, etc.
I need to use some sort of string similarity mechanism.
Right now I'm using the lovely difflib to compare the leads' first and last names to a list generated with Client.objects.all(). It works, but because of the number of clients it tends to be slow.
I know that most sql databases have soundex and difference functions. See my test of it in the update below - it doesn't work as well as difflib.
Is there another solution? Is there a better solution?
Edit:
Soundex, at least in my db, doesn't behave as well as difflib.
Here is a simple test - look for "Joe Lopes" in a table containing "Joseph Lopes":
with temp (first_name, last_name) as (
select 'Joseph', 'Lopes'
union
select 'Joe', 'Satriani'
union
select 'CZ', 'Lopes'
union
select 'Blah', 'Lopes'
union
select 'Antonio', 'Lopes'
union
select 'Carlos', 'Lopes'
)
select first_name, last_name
from temp
where difference(first_name+' '+last_name, 'Joe Lopes') >= 3
order by difference(first_name+' '+last_name, 'Joe Lopes')
The above returns "Joe Satriani" as the only match. Even reducing the similarity threshold to 2 doesn't return "Joseph Lopes" as a potential match.
But difflib does a much better job:
difflib.get_close_matches('Joe Lopes', ['Joseph Lopes', 'Joe Satriani', 'CZ Lopes', 'Blah Lopes', 'Antonio Lopes', 'Carlos Lopes'])
['Joseph Lopes', 'CZ Lopes', 'Carlos Lopes']
Edit after gruszczy's response:
Before writing my own, I looked for and found a T-SQL implementation of Levenshtein Distance in the repository of all knowledge.
In testing it, it still won't do a better matching job than difflib.
Which led me to research what algorithm is behind difflib. It seems to be a modified version of the Ratcliff-Obershelp algorithm.
Unhappily I can't seem to find some other kind soul who has already created a T-SQL implementation based on difflib's... I'll try my hand at it when I can.
If nobody else comes up with a better answer in the next few days, I'll grant it to gruszczy. Thanks, kind sir.
soundex won't help you, because it's a phonetic algorithm. Joe and Joseph aren't similar phonetically, so soundex won't mark them as similar.
You can try Levenshtein distance, which is implemented in PostgreSQL. Maybe in your database too and if not, you should be able to write a stored procedure, which will calculate the distance between two strings and use it in your computation.
It's possible with trigram_similar lookups since Django 1.10, see docs for PostgreSQL specific lookups and Full text search
As per the answer of andilabs you can use the Levenshtein function to create your custom function. Postgres doc indicates that the Levenshtein function is as follows:
levenshtein(text source, text target, int ins_cost, int del_cost, int sub_cost) returns int
levenshtein(text source, text target) returns int
andilabs answer can use the only second function. If you want a more advanced search with insertion/deletion/substitution costs, you can rewrite function like this:
from django.db.models import Func
class Levenshtein(Func):
template = "%(function)s(%(expressions)s, '%(search_term)s', %(ins_cost)d, %(del_cost)d, %(sub_cost)d)"
function = 'levenshtein'
def __init__(self, expression, search_term, ins_cost=1, del_cost=1, sub_cost=1, **extras):
super(Levenshtein, self).__init__(
expression,
search_term=search_term,
ins_cost=ins_cost,
del_cost=del_cost,
sub_cost=sub_cost,
**extras
)
And call the function:
from django.db.models import F
Spot.objects.annotate(
lev_dist=Levenshtein(F('name'), 'Kfaka', 3, 3, 1) # ins = 3, del = 3, sub = 1
).filter(
lev_dist__lte=2
)
If you need getting there with django and postgres and don't want to use introduced in 1.10 trigram-similarity https://docs.djangoproject.com/en/2.0/ref/contrib/postgres/lookups/#trigram-similarity you can implement using Levensthein like these:
Extension needed fuzzystrmatch
you need adding postgres extension to your db in psql:
CREATE EXTENSION fuzzystrmatch;
Lets define custom function with wich we can annotate queryset. It just take one argument the search_term and uses postgres levenshtein function (see docs):
from django.db.models import Func
class Levenshtein(Func):
template = "%(function)s(%(expressions)s, '%(search_term)s')"
function = "levenshtein"
def __init__(self, expression, search_term, **extras):
super(Levenshtein, self).__init__(
expression,
search_term=search_term,
**extras
)
then in any other place in project we just import defined Levenshtein and F to pass the django field.
from django.db.models import F
Spot.objects.annotate(
lev_dist=Levenshtein(F('name'), 'Kfaka')
).filter(
lev_dist__lte=2
)