How to do SPARQL query using bif:regexp_match on Jena - regex

I have the following SPARQL query on Virtuoso:
PREFIX wd: <http://www.wikidata.org/entity/>
SELECT DISTINCT ?p, ?title WHERE {
?p a ?c.
?c rdfs:subClassOf* wd:Q2431196.
?p rdfs:label ?title .
FILTER (bif:regexp_match("^Vamp( [(].*[)])?$", ?title))
}
On this SPARQL endpoint, it works fine. It returns the tv show Vamp and also Vamp (telenovela) as expected.
Now I'm trying to do the same on Java, using Jena API, and it fails as follows.
Exception in thread "main" com.hp.hpl.jena.query.QueryParseException: Line 10, column 204: Unresolved prefixed name: bif:regexp_match
I found a solution to get rid of the Jena exception, as suggested for bif:contains. The query would then be as follows:
PREFIX wd: <http://www.wikidata.org/entity/>
SELECT DISTINCT ?p, ?title WHERE {
?p a ?c.
?c rdfs:subClassOf* wd:Q2431196.
?p rdfs:label ?title .
?title <bif:regexp_match> "^Vamp( [(].*[)])?$"
}
However, that query does not return any elements as the previous query did. It doesn't return any elements on the SPARQL endpoint web interface either (as the previous query did)
Am I doing something wrong? How can I regex it properly?
ps: using FILTER REGEX( ?title, "^Vamp( [(].*[)])?$") works on the web SPARQL endpoint, but throws the following error when on Java/Jena:
Sep 16, 2015 3:16:32 PM org.apache.jena.riot.system.ErrorHandlerFactory$ErrorLogger logError
SEVERE: Invalid byte 2 of 3-byte UTF-8 sequence.`
I think this error has to do with the ( ) characters..

use this PREFIX bif:<bif:>
instead of PREFIX bif:<> in for jena.

You were right in your regex pattern, just a little editing when it comes to java.
For it to work in java, just put the left parentheses ( after ^ and put the right parentheses ) before $.
Your regex pattern should be like this:
"^(Vamp( [(].*[)])?)$";
hope this helps

You can use the following prefix declaration as a workaround.
PREFIX bif: <bif:>
Live Link demonstrating workaround in action.
Live Virtuoso SPARQL Query Editor Link showcasing workaround.
Ultimately, the URI for the Prefix declaration should be:
PREFIX bif: <http://www.openlinksw.com/schemas/bif#>
Which I explain in a Twitter Thread about the same issue i.e., we are working to rectify the regression associated with the standard prefix declaration above.

Jena will fail to parse your SPARQL as it is invalid.
The main issue is that bif: is a built in prefix in Virtuoso.
To allow Jena to parse it you need to add
PREFIX bif:<>
to your query.

As AndyS answered in here, the problem is that bif is a virtuoso-specific feature, So you should use QueryEngineHTTP instead of QueryExecutionFactory.sparqlService. This will submit your query directly to the endpoint and will not pass it through Jena parser.
QueryEngineHTTP query_engine = new QueryEngineHTTP(endpoint, query);

Related

replace expression format xx-xx-xxxx_12345678

IDENTIFIER
31-03-2022_13636075
01-04-2022_13650262
04-04-2022_13663174
05-04-2022_13672025
20220099001
11614491_R
10781198
00000000000
11283627_P
11614491_R
-1
how can i remove (only) the "XX-XX-XXXXX_" Part in certain values of a column in SSIS but WITHOUT affecting values that doesn't have this format? For example "21-05-2022_12345678" = "12345678" but the other values i don't want them affected. This are just examples of many rows from this column so i want only the ones that have this format to be affected.
SELECT REVERSE(substring(REVERSE('09-03-2022_13481330'),0,CHARINDEX('_',REVERSE('09-03-2022_13481330'),0)))
result
13481330
but this also affects others values.Also this is in ssms not ssis because i am not sure how to transform this expression in ssis code.
Update : Corrected code in SSIS goes as following:
(FINDSTRING(IDENTIFIER,"__-__-____[_]",1) == 1) ? SUBSTRING(IIDENTIFIER,12,LEN(IDENTIFIER) - 11) : IDENTIFIER
Do you have access to the SQL source? You can do this on the sql by using a LIKE and crafting a match pattern using the single char wildcard _ please see below example
DECLARE #Value VARCHAR(50) = '09-03-2022_13481330'
SELECT CASE WHEN #Value LIKE '__-__-____[_]%' THEN
SUBSTRING(#Value,12,LEN(#Value)-11) ELSE #Value END
Please see the Microsoft Documentation on LIKE and using single char wildcards
If you don't have access to the source SQL it gets a bit more tricky as you might need to use regex in a script task or maybe there is a expression you can apply

Dealing with special characters in SPARQL-Filter expressions

I access the SPARQL-Endpoint of dbpedia[1] to get the URI for a given city. I use the following query to achieve this:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dbpedia: <http://dbpedia.org/ontology/>
select distinct * where
{?uri rdfs:label ?label.
FILTER (REGEX(STR(?label), "^Köln$", "i")).
?uri a dbpedia:PopulatedPlace.
}
If I query for a city without a german umlaut, everything works fine, but if there is an umlaut, I get nothing. When executing this query via code, I even get a 406-error (not acceptable)
Any idea, how to deal with umlauts in a SPARQL-query against dbpedia?
Thanks in advance,
Frank
[1] http://dbpedia.org/sparql
There seems to be a bug in the handling of your character, maybe in transport, or otherwise. It does work when you just write it down in unicode hex for ö, like so:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dbpedia: <http://dbpedia.org/ontology/>
select distinct * where
{?uri rdfs:label ?label.
FILTER (REGEX(STR(?label), "^K\u00F6ln$")).
?uri a dbpedia:PopulatedPlace.
}
Edit: I see now that this isn't working with the 'i' flag. Documentation suggests the 'u' flag would be applicable here.

AWQL - how can i use a regular expressions or something similar?

I am querying the adwords api via the following AWQL-Query (which works fine):
SELECT AccountDescriptiveName, CampaignId, CampaignName, AdGroupId, AdGroupName, KeywordText, KeywordMatchType, MaxCpc, Impressions, Clicks, Cost, Conversions, ConversionsManyPerClick, ConversionValue
FROM KEYWORDS_PERFORMANCE_REPORT
WHERE CampaignStatus IN ['ACTIVE', 'PAUSED']
AND AdGroupStatus IN ['ENABLED', 'PAUSED']
AND Status IN ['ACTIVE', 'PAUSED']
AND AdNetworkType1 IN ['SEARCH'] AND Impressions > 0
DURING 20140501,20140531
Now i want to exclude some campaigns:
we have a convention for our new campaigns that the campaign name begins with three numbers followed by an underscore, eg. "100_brand_all"
So i want to get only these new campaigns..
I tried lots of different variations for STARTS_WITH but only exact strings are working - but i need a pattern to match!
I already read https://developers.google.com/adwords/api/docs/guides/awql?hl=en and following its content it should be possible to use a WHERE expression like this:
CampaignName STARTS_WITH ['0','1','2','3']
But that doesn't work!
Any other ideas how i can achieve this?
Well, why don't you run a campaign performance report first, then process that ( get the campaign ids you want or don't want) the use those in the "CampaignId IN [campaign ids here] . or CampaignID NOT_IN [campaign ids]

How do I use regex in a SQLite query?

I'd like to use a regular expression in sqlite, but I don't know how.
My table has got a column with strings like this: "3,12,13,14,19,28,32"
Now if I type "where x LIKE '3'" I also get the rows which contain values like 13 or 32,
but I'd like to get only the rows which have exactly the value 3 in that string.
Does anyone know how to solve this?
As others pointed out already, REGEXP calls a user defined function which must first be defined and loaded into the the database. Maybe some sqlite distributions or GUI tools include it by default, but my Ubuntu install did not. The solution was
sudo apt-get install sqlite3-pcre
which implements Perl regular expressions in a loadable module in /usr/lib/sqlite3/pcre.so
To be able to use it, you have to load it each time you open the database:
.load /usr/lib/sqlite3/pcre.so
Or you could put that line into your ~/.sqliterc.
Now you can query like this:
SELECT fld FROM tbl WHERE fld REGEXP '\b3\b';
If you want to query directly from the command-line, you can use the -cmd switch to load the library before your SQL:
sqlite3 "$filename" -cmd ".load /usr/lib/sqlite3/pcre.so" "SELECT fld FROM tbl WHERE fld REGEXP '\b3\b';"
If you are on Windows, I guess a similar .dll file should be available somewhere.
SQLite3 supports the REGEXP operator:
WHERE x REGEXP <regex>
http://www.sqlite.org/lang_expr.html#regexp
A hacky way to solve it without regex is where ',' || x || ',' like '%,3,%'
SQLite does not contain regular expression functionality by default.
It defines a REGEXP operator, but this will fail with an error message unless you or your framework define a user function called regexp(). How you do this will depend on your platform.
If you have a regexp() function defined, you can match an arbitrary integer from a comma-separated list like so:
... WHERE your_column REGEXP "\b" || your_integer || "\b";
But really, it looks like you would find things a whole lot easier if you normalised your database structure by replacing those groups within a single column with a separate row for each number in the comma-separated list. Then you could not only use the = operator instead of a regular expression, but also use more powerful relational tools like joins that SQL provides for you.
A SQLite UDF in PHP/PDO for the REGEXP keyword that mimics the behavior in MySQL:
$pdo->sqliteCreateFunction('regexp',
function ($pattern, $data, $delimiter = '~', $modifiers = 'isuS')
{
if (isset($pattern, $data) === true)
{
return (preg_match(sprintf('%1$s%2$s%1$s%3$s', $delimiter, $pattern, $modifiers), $data) > 0);
}
return null;
}
);
The u modifier is not implemented in MySQL, but I find it useful to have it by default. Examples:
SELECT * FROM "table" WHERE "name" REGEXP 'sql(ite)*';
SELECT * FROM "table" WHERE regexp('sql(ite)*', "name", '#', 's');
If either $data or $pattern is NULL, the result is NULL - just like in MySQL.
My solution in Python with sqlite3:
import sqlite3
import re
def match(expr, item):
return re.match(expr, item) is not None
conn = sqlite3.connect(':memory:')
conn.create_function("MATCHES", 2, match)
cursor = conn.cursor()
cursor.execute("SELECT MATCHES('^b', 'busy');")
print cursor.fetchone()[0]
cursor.close()
conn.close()
If regex matches, the output would be 1, otherwise 0.
With python, assuming con is the connection to SQLite, you can define the required UDF by writing:
con.create_function('regexp', 2, lambda x, y: 1 if re.search(x,y) else 0)
Here is a more complete example:
import re
import sqlite3
with sqlite3.connect(":memory:") as con:
con.create_function('regexp', 2, lambda x, y: 1 if re.search(x,y) else 0)
cursor = con.cursor()
# ...
cursor.execute("SELECT * from person WHERE surname REGEXP '^A' ")
I don't it is good to answer a question which was posted almost an year ago. But I am writing this for those who think that Sqlite itself provide the function REGEXP.
One basic requirement to invoke the function REGEXP in sqlite is
"You should create your own function in the application and then provide the callback link to the sqlite driver".
For that you have to use sqlite_create_function (C interface). You can find the detail from here and here
An exhaustive or'ed where clause can do it without string concatenation:
WHERE ( x == '3' OR
x LIKE '%,3' OR
x LIKE '3,%' OR
x LIKE '%,3,%');
Includes the four cases exact match, end of list, beginning of list, and mid list.
This is more verbose, doesn't require the regex extension.
UPDATE TableName
SET YourField = ''
WHERE YourField REGEXP 'YOUR REGEX'
And :
SELECT * from TableName
WHERE YourField REGEXP 'YOUR REGEX'
SQLite version 3.36.0 released 2021-06-18 now has the REGEXP command builtin.
For CLI build only.
Consider using this
WHERE x REGEXP '(^|,)(3)(,|$)'
This will match exactly 3 when x is in:
3
3,12,13
12,13,3
12,3,13
Other examples:
WHERE x REGEXP '(^|,)(3|13)(,|$)'
This will match on 3 or 13
You may consider also
WHERE x REGEXP '(^|\D{1})3(\D{1}|$)'
This will allow find number 3 in any string at any position
You could use a regular expression with REGEXP, but that is a silly way to do an exact match.
You should just say WHERE x = '3'.
If you are using php you can add any function to your sql statement by using: SQLite3::createFunction.
In PDO you can use PDO::sqliteCreateFunction and implement the preg_match function within your statement:
See how its done by Havalite (RegExp in SqLite using Php)
In case if someone looking non-regex condition for Android Sqlite, like this string [1,2,3,4,5] then don't forget to add bracket([]) same for other special characters like parenthesis({}) in #phyatt condition
WHERE ( x == '[3]' OR
x LIKE '%,3]' OR
x LIKE '[3,%' OR
x LIKE '%,3,%');
You can use the sqlean-regexp extension, which provides regexp search and replace functions.
Based on the PCRE2 engine, this extension supports all major regular expression features. It also supports Unicode. The extension is available for Windows, Linux, and macOS.
Some usage examples:
-- select messages containing number 3
select * from messages
where msg_text regexp '\b3\b';
-- count messages containing digits
select count(*) from messages
where msg_text regexp '\d+';
-- 42
select regexp_like('Meet me at 10:30', '\d+:\d+');
-- 1
select regexp_substr('Meet me at 10:30', '\d+:\d+');
-- 10:30
select regexp_replace('password = "123456"', '"[^"]+"', '***');
-- password = ***
In Julia, the model to follow can be illustrated as follows:
using SQLite
using DataFrames
db = SQLite.DB("<name>.db")
register(db, SQLite.regexp, nargs=2, name="regexp")
SQLite.Query(db, "SELECT * FROM test WHERE name REGEXP '^h';") |> DataFrame
for rails
db = ActiveRecord::Base.connection.raw_connection
db.create_function('regexp', 2) do |func, pattern, expression|
func.result = expression.to_s.match(Regexp.new(pattern.to_s, Regexp::IGNORECASE)) ? 1 : 0
end

doctrine2 dql, use setParameter with % wildcard when doing a like comparison

I want to use the parameter place holder - e.g. ?1 - with the % wild cards. that is, something like: "u.name LIKE %?1%" (though this throws an error). The docs have the following two examples:
1.
// Example - $qb->expr()->like('u.firstname', $qb->expr()->literal('Gui%'))
public function like($x, $y); // Returns Expr\Comparison instance
I do not like this as there is no protection against code injection.
2.
// $qb instanceof QueryBuilder
// example8: QueryBuilder port of: "SELECT u FROM User u WHERE u.id = ?1 OR u.nickname LIKE ?2 ORDER BY u.surname DESC" using QueryBuilder helper methods
$qb->select(array('u')) // string 'u' is converted to array internally
->from('User', 'u')
->where($qb->expr()->orx(
$qb->expr()->eq('u.id', '?1'),
$qb->expr()->like('u.nickname', '?2')
))
->orderBy('u.surname', 'ASC'));
I do not like this because I need to search for terms within the object's properties - that is, I need the wild cards on either side.
When binding parameters to queries, DQL pretty much works exactly like PDO (which is what Doctrine2 uses under the hood).
So when using the LIKE statement, PDO treats both the keyword and the % wildcards as a single token. You cannot add the wildcards next to the placeholder. You must append them to the string when you bind the params.
$qb->expr()->like('u.nickname', '?2')
$qb->getQuery()->setParameter(2, '%' . $value . '%');
See this comment in the PHP manual.
The selected answer is wrong. It works, but it is not secure.
You should escape the term that you insert between the percentage signs:
->setParameter(2, '%'.addcslashes($value, '%_').'%')
The percentage sign '%' and the symbol underscore '_' are interpreted as wildcards by LIKE. If they're not escaped properly, an attacker might construct arbirtarily complex queries that can cause a denial of service attack. Also, it might be possible for the attacker to get search results he is not supposed to get. A more detailed description of attack scenarios can be found here: https://stackoverflow.com/a/7893670/623685