sparql exact match regex

sparql exact match regex - regex

I'am using the following sparql query to extract from dbpedia the pages which match a specific infobox:
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbpedia: <http://dbpedia.org/property/>
PREFIX res:<http://dbpedia.org/resource/>
SELECT DISTINCT *
WHERE {
?page dbpedia:wikiPageUsesTemplate ?template .
?page rdfs:label ?label .
FILTER (regex(?template, 'Infobox_artist')) .
FILTER (lang(?label) = 'en')
}
LIMIT 100
In this line of the query :
FILTER (regex(?template, 'Infobox_artist')) .
I get all the infoboxes that start with artist as artist_discography and other which I don't need. My question is: how can I get by a regex only the infoboxes that matche exactly "infobox_artist" ?

As it is a regex you should be able to restrict the search as follows:
FILTER (regex(?template, '^Infobox_artist$')) .
^ is the beginning of a string
$ is the end of a string
in a regex.
NB: I've not used sparql, so this may well not work.

While the approach suggested by #beny23 works, it is really very inefficient. Using a regex for essentially matching an exact value is (potentially) putting an unnessary burden on the endpoint being queried. This is bad practice.
The value of ?template is a URI, so you really should use a value comparison (or even inline as #cygri demonstrated):
SELECT DISTINCT * {
?page dbpedia:wikiPageUsesTemplate ?template .
?page rdfs:label ?label .
FILTER (lang(?label) = 'en')
FILTER (?template = <http://dbpedia.org/resource/Template:Infobox_artist> )
}
LIMIT 100
You can still easily adapt this query string in code to work with different types of infoboxes. Also: depending on which toolkit you use to create and execute SPARQL queries, you may have some programmatic alternatives to make query reuse even easier.
For example, you can create a "prepared query" which you can reuse, and set a binding to a particular value before executing it. For example, in Sesame you could do something like this:
String q = "SELECT DISTINCT * { " +
" ?page dbpedia:wikiPageUsesTemplate ?template . " +
" ?page rdfs:label ?label . " +
" FILTER (lang(?label) = 'en') " +
" } LIMIT 100 ";
TupleQuery query = conn.prepareTupleQuery(SPARQL, q);
URI infoboxArtist = f.createURI(DBPedia.NAMESPACE, "Template:Infobox_artist");
query.setBinding("template", infoboxArtist);
TupleQueryResult result = query.evaluate();
(As an aside: showing example using Sesame because I'm on the Sesame development team, but no doubt other SPARQL/RDF toolkits have similar functionality)

If all you want to do is a direct string comparison, then you don't need a regex! This is simpler and faster:
SELECT DISTINCT * {
?page dbpedia:wikiPageUsesTemplate
<http://dbpedia.org/resource/Template:Infobox_artist> .
?page rdfs:label ?label .
FILTER (lang(?label) = 'en')
}
LIMIT 100

Related

Regex Match Kusto

I have below 2 tables, One with complete list of URLs and other table with regex representation of all URLs (nearly 100 values) with corresponding topic. I now want to create a third table which maps each url with the topic based on the regex pattern.
I figured that kusto offers 'matches regex' but it cannot be used at a row level. Ideally I want to create a function and pass URL which output the corresponding Topic
Table1:
| URL |
Table2:
|URL Regex| Topic|
Output:
|URL | Topic|
let me know if the below logic needs any tuning for it to work,
Query:
.create-or-alter function with findTopic(Path:string) {
toscalar(Table2
| extend TopicName=case (Path matches regex URLRegex, Topic,"Not Found")
| project Topic)
}
Table1
| extend Topic=findTopic(Path)

Regular expressions can't be originated from a dynamic source, like another table. In Kusto, regular expressions must be string scalars.
In your case this isn't a problem, since there are about 100 different topics. You can maintain a stored function that does the URI categorization:
.create-or-alter function GetUrlTopic(Url:string)
{
case(
Url matches regex #"https://bing.com.*", "Search",
Url matches regex #"https://stackoverflow.com.*", "Q&A",
"N/A")
}
Example:
let Uris=datatable(Url:string)
[
"https://bing.com/foo/bar",
"https://bing.com/1/2",
"https://microsoft.com",
"https://stackoverflow.com/q/1",
"https://stackoverflow.com/q/2"
];
Uris
| extend Topic=GetUrlTopic(Url)
Result:
Url
Topic
https://bing.com/foo/bar
Search
https://bing.com/1/2
Search
https://microsoft.com
N/A
https://stackoverflow.com/q/1
Q&A
https://stackoverflow.com/q/2
Q&A

Django - Full text search - Wildcard

Is it possible to use wildcards in Django Full text search ?
https://docs.djangoproject.com/en/1.11/ref/contrib/postgres/search/
post = request.POST.get('search')
query = SearchQuery(post)
vector = SearchVector('headline', weight='A') + SearchVector('content', weight='B')
rank = SearchRank(vector, query, weights=[0.1,0.2])
data = wiki_entry.objects.annotate(rank=SearchRank(vector,query)).filter(rank__gte=0.1).order_by('-rank')
At the moment it only matches on full words.
Characters like * % | & have no effect.
Or do i have to go back to icontains ?
https://docs.djangoproject.com/en/1.11/ref/models/querysets/#icontains
Any help is appreciated

I extend the django SearchQuery class and override plainto_tsquery with to_tsquery. Did some simple tests, it works. I will get back here if I find cases where this causes problems.
from django.contrib.postgres.search import SearchQuery
class MySearchQuery(SearchQuery):
def as_sql(self, compiler, connection):
params = [self.value]
if self.config:
config_sql, config_params = compiler.compile(self.config)
template = 'to_tsquery({}::regconfig, %s)'.format(config_sql)
params = config_params + [self.value]
else:
template = 'to_tsquery(%s)'
if self.invert:
template = '!!({})'.format(template)
return template, params
Now I can do something like query = MySearchQuery('whatever:*')

[Postgres' part] The Postgres manual mentions this only briefly ( https://www.postgresql.org/docs/current/static/textsearch-controls.html#TEXTSEARCH-PARSING-QUERIES), but yes, it is possible, if you just need prefix matching:
test=# select to_tsvector('abcd') ## to_tsquery('ab:*');
?column?
----------
t
(1 row)
test=# select to_tsvector('abcd') ## to_tsquery('ac:*');
?column?
----------
f
(1 row)
And such query will utilize GIN index (I assume you have one).
[Django's part] I'm not Django user, so I made quick research and found that, unfortunately, Django uses plainto_tsquery() function, not to_tsquery(): https://docs.djangoproject.com/en/1.11/_modules/django/contrib/postgres/search/#SearchQuery
plainto_tsquery() made for simplicity, when you use just plain text as an input – so it doesn't support advanced queries:
test=# select to_tsvector('abcd') ## plainto_tsquery('ab:*');
?column?
----------
f
(1 row)
test=# select to_tsvector('abcd') ## plainto_tsquery('ac:*');
?column?
----------
f
(1 row)
So in this case, I'd recommend you using plain SQL with to_tsquery(). But you need to be sure you filtered out all special chars (like & or |) from your text input, otherwise to_tsquery() will produce wrong results or even error. Or if you can, extend django.contrib.postgres.search with the ability to work with to_tsquery() (this would be great contribution, btw).
Alternatives are:
if your data is ACSII-only, you can use LIKE with prefix search and B-tree index created with text_pattern_ops / varchar_pattern_ops operator classes (if you need case-insensitivity, use functional index over lower(column_name) and lower(column_name) like '...%'; see https://www.postgresql.org/docs/9.6/static/indexes-opclass.html);
use pg_trgm index, which supports regular expressions and GiST/GIN indexes (https://www.postgresql.org/docs/9.6/static/pgtrgm.html)

Illegal string offset 'order_status_id' in opencart

I am getting error Illegal string offset 'order_status_id' when I want to get loop data in view
Here's the code:
controller.php
if (isset($_POST["email"])) {
$email = $_POST["email"];
}
$this->load->model('order/myorder');
$data['data1'] = $this->model_order_myorder->getOrder($email) ;
view.php
foreach ($data1 as $row) {
echo echo $row->order_id;
}
model.php
class ModelOrderMyorder extends Model {
public function getOrder($email) {
$sql = "SELECT * FROM ".DB_PREFIX."order, ".DB_PREFIX."order_product WHERE ".DB_PREFIX."order.email = '$email'";
$query = $this->db->query($sql);
return $query ;
}
}
Still not getting it showing Trying to get property of non-object in view.

First off, if you want to iterate through all the order products for a given email (which is what I think you want) you should change the getOrder() method to return:
return $query->rows;
Then in the controller you need to change:
$data['data1'] = $this->model_order_myorder->getOrder($email) ;
to
$this->data['data1'] = $this->model_order_myorder->getOrder($email);
Finally, in your view, you'll be accessing an array not an object so you should lose the extra echo (assuming this is a typo) and change:
echo echo $row->order_id;
and get the index as:
echo $row['order_id']
Also, in addition to the above, I'll suggest you utilize some of the methods and code conventions found in Opencart:
When accessing the $_POST global you can use the sanitized
version $this->request->post
Your query fails to backtick the order table which can result
in errors in you didn't set a prefix. And you are not escaping
$email which is a good idea for a number of reasons. Also, it makes
things easy if you give your tables an alias. Finally, a join on the
tables... so I might consider rewriting that query like this:
$sql = "SELECT * FROM `" . DB_PREFIX . "order` o LEFT JOIN " . DB_PREFIX . "order_product op USING (order_id) WHERE o.email = '" . $this->db->escape($email) . "'";
To be honest, I'm not sure what results you're expecting from that query but bear in mind that if there are multiple products for an given order you will end up with multiple rows returned.
Just a few tips.. hopefully this is useful to you.

How to find all the source lines containing desired table names from user_source by using 'regexp'

For example we have a large database contains lots of oracle packages, and now we want to see where a specific table resists in the source code. The source code is stored in user_source table and our desired table is called 'company'.
Normally, I would like to use:
select * from user_source
where upper(text) like '%COMPANY%'
This will return all words containing 'company', like
121 company cmy
14 company_id, idx_name %% end of coding
453 ;companyname
1253 from db.company.company_id where
989 using company, idx, db_name,
So how to make this result more intelligent using regular expression to parse all the source lines matching a meaningful table name (means a table to the compiler)?
So normally we allow the matched word contains chars like . ; , '' "" but not _
Can anyone make this work?

To find company as a "whole word" with a regular expression:
SELECT * FROM user_source
WHERE REGEXP_LIKE(text, '(^|\s)company(\s|$)', 'i');
The third argument of i makes the REGEXP_LIKE search case-insensitive.
As far as ignoring the characters . ; , '' "", you can use REGEXP_REPLACE to suck them out of the string before doing the comparison:
SELECT * FROM user_source
WHERE REGEXP_LIKE(REGEXP_REPLACE(text, '[.;,''"]'), '(^|\s)company(\s|$)', 'i');
Addendum: The following query will also help locate table references. It won't give the source line, but it's a start:
SELECT *
FROM user_dependencies
WHERE referenced_name = 'COMPANY'
AND referenced_type = 'TABLE';

If you want to identify the objects that refer to your table, you can get that information from the data dictionary:
select *
from all_dependencies
where referenced_owner = 'DB'
and referenced_name = 'COMPANY'
and referenced_type = 'TABLE';
You can't get the individual line numbers from that, but you can then either look at user_source or use a regexp on the specific source code, which woudl at least reduce false positives.

SELECT * FROM user_source
WHERE REGEXP_LIKE(text,'([^_a-z0-9])company([^_a-z0-9])','i')
Thanks #Ed Gibbs, with a little trick this modified answer could be more intelligent.

SQL and regular expression to check if string is a substring of larger string?

I have a database filled with some codes like
EE789323
990
78000
These numbers are ALWAYS endings of a larger code. Now I have a function that needs to check if the larger code contains the subcode.
So if I have codes 90 and 990 and my full code is EX888990, it should match both of them.
However I need to do it in the following way:
SELECT * FROM tableWithRecordsWithSubcode
WHERE subcode MATCHES [reg exp with full code];
Is a regular expression like this this even possible?
EDIT:
To clarify the issue I'm having, I'm not using SQL here. I just used that to give an example of the type of query I'm using.
In fact I'm using iOS with CoreData, and I need a predicate to fetch me only the records that match.
In the way that is mentioned below.

Given the observations from a comment:
Do you have two tables, one called tableWithRecordsWithSubcode and another that might be tableWithFullCodeColumn? So the matching condition is in part a join - you need to know which subcodes match any of the full codes in the second table? But you're only interested in the information in the tableWithRecordsWithSubcode table, not in which rows it matches in the other table?
and the laconic "you're correct" response, then we have to rewrite the query somewhat.
SELECT DISTINCT S.*
FROM tableWithRecordsWithSubcode AS S
JOIN tableWithFullCodeColumn AS F
ON F.Fullcode ...ends-with... S.Subcode
or maybe using an EXISTS sub-query:
SELECT S.*
FROM tableWithRecordsWithSubcode AS S
WHERE EXISTS(SELECT * FROM tableWithFullCodeColumn AS F
WHERE F.Fullcode ...ends-with... S.Subcode)
This uses a correlated sub-query but avoids the DISTINCT operation; it may mean the optimizer can work more efficiently.
That just leaves the magical 'X ...ends-with... T' operator to be defined. One possible way to do that is with LENGTH and SUBSTR. However, SUBSTR does not behave the same way in all DBMS, so you may have to tinker with this (possibly adding a third argument, LENGTH(s.subcode)):
LENGTH(f.fullcode) >= LENGTH(s.subcode) AND
SUBSTR(f.fullcode, LENGTH(f.fullcode) - LENGTH(s.subcode)) = s.subcode
This leads to two possible formulations:
SELECT DISTINCT S.*
FROM tableWithRecordsWithSubcode AS S
JOIN tableWithFullCodeColumn AS F
ON LENGTH(F.Fullcode) >= LENGTH(S.Subcode)
AND SUBSTR(F.Fullcode, LENGTH(F.Fullcode) - LENGTH(S.Subcode)) = S.Subcode;
and
SELECT S.*
FROM tableWithRecordsWithSubcode AS S
WHERE EXISTS(
SELECT * FROM tableWithFullCodeColumn AS F
WHERE LENGTH(F.Fullcode) >= LENGTH(S.Subcode)
AND SUBSTR(F.Fullcode, LENGTH(F.Fullcode) - LENGTH(S.Subcode)) = S.Subcode);
This is not going to be a fast operation; joins on computed results such as required by this query seldom are.

I'm not sure why you think that you need a regular expression... Just use the charindex function:
select something
from table
where charindex(code, subcode) <> 0
Edit:
To find strings at the end, you can create a pattern with the % wildcard from the subcode:
select something
from table
where '%' + subcode like code

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

sparql exact match regex - regex

As it is a regex you should be able to restrict the search as follows: FILTER (regex(?template, '^Infobox_artist$')) . ^ is the beginning of a string $ is the end of a string in a regex. NB: I've not used sparql, so this may well not work.

If all you want to do is a direct string comparison, then you don't need a regex! This is simpler and faster: SELECT DISTINCT * { ?page dbpedia:wikiPageUsesTemplate <http://dbpedia.org/resource/Template:Infobox_artist> . ?page rdfs:label ?label . FILTER (lang(?label) = 'en') } LIMIT 100

Related

Regex Match Kusto

Django - Full text search - Wildcard

Illegal string offset 'order_status_id' in opencart

How to find all the source lines containing desired table names from user_source by using 'regexp'

SQL and regular expression to check if string is a substring of larger string?

Categories

Resources