I'm using RegEx in a MySQLi prepared statement to find any rows that contain only an integer in a particular field. I want to replace those integers with an empty string. I also want to ignore any rows that contain alphabetic characters in that field. However, my syntax seems to be wrong. Here is my PHP code:
$Empty = "";
$IntegersOnly = "[0-9]+";
$stmt = mysqli_prepare($db, "UPDATE MyTable set MyField = ? where REGEXP ?");
$bind = mysqli_stmt_bind_param($stmt, "ss", $Empty, $IntegersOnly);
$exec = mysqli_stmt_execute($stmt);
I'm getting the following error at the bind line: mysqli_stmt_bind_param() expects parameter 1 to be mysqli_stmt, boolean given
So I believe my $stmt or $bind syntax is wrong.
Please consider that my programming experience is limited. I am hoping to use MySQLi (not PDO) and procedural programming (not OOP).
I'm using PHP 7 with a recent version of MariaDB, which is equivalent to MySQL
My apologies. I found the mistake. I should have used this:
$stmt = mysqli_prepare($db, "UPDATE MyTable set MyField = ? where MyField REGEXP ?");
In other words, I accidentally omitted MyField before REGEXP.
In spring-date-jpa, when my specification contains query.distinct(true) i see that the count(Specification) method, the count query is incorrect.
The generated SQL looks like this :
select distinctcount(*) ...
Instead of
select count(distinct(*)) ...
In that post:spring-source-forum
The workaround was to mofify the source code of spring-data-jpa.
Is there any other solution?
I have found another workaround:
Subquery<Class1> sq = query.subquery(Class1.class);
Root<Class1> spouseEmp = sq.from(Class1.class);
sq.select(spouseEmp);
sq.where(builder.equal(spouseEmp.get("class2"),class2.get("class2")));
predicate.getExpressions().add(builder.exists(sq));
CriteriaBuilder has a countDistinct that should get you want you want. I know this question is years old but I've been hunting for this answer for sometime. I was running into a similar issue where query.distinct(true) was not working as expected when used with count. In my example I was using a Specification. Posting this answer to raise awareness for countDistinct
public TypedQuery<Long> getCountQuery(Specification<T> spec) {
CriteriaBuilder builder = this.em.getCriteriaBuilder();
CriteriaQuery<Long> query = builder.createQuery(Long.class);
Root<?> root = applySpecificationToCriteria(spec, query);
query.select(builder.countDistinct(root));
return this.em.createQuery(query);
}
I have come across this multiple times in the past, and I just ended up writing a raw SQL query to overcome it, but I really want to find out why this is happening. Look at the statement below
$q = $this->entityManager->createQuery('SELECT um,s,st
FROM Dashboard\Entity\Metric um
JOIN um.stat st
JOIN um.site s
JOIN s.clients c
WHERE c.id = ?1
AND s.competitor = 0
AND s.ignored = 0
AND st.id IN (?2)
GROUP BY s.id, st.id
ORDER BY st.response_field, s.id')
->setParameter(1, $params['c_id'])
->setParameter(2, $statId);
$sql = $q->getSql();
$rs = $q->getResult();
If I take the contents of $sql and paste them into a mySQL tool and run the raw query, it returns 18 results which is correct.
However, $rs only contains 3 results. $statId is a comma-separated string of 6 numbers: (1,2,3,4,5,6). So I am grouping by st.id, and s.id. There will be 3 s.id elements for every st.id element, working out to the 18 results I expected. What's happening is Doctrine is only returning the first st.id which is the group of 3 s.id 's
Any idea what could be causing this?
I got my answer from IRC, but posting here in case it helps someone else. Smart WHERE IN statements aren't planned for support until 2.1 version.
http://groups.google.com/group/doctrine-dev/browse_thread/thread/fbf70837293676fb
But I can accomplish the same goal with query builder.
http://www.doctrine-project.org/docs/orm/2.0/en/reference/query-builder.html
I'd like to use a regular expression in sqlite, but I don't know how.
My table has got a column with strings like this: "3,12,13,14,19,28,32"
Now if I type "where x LIKE '3'" I also get the rows which contain values like 13 or 32,
but I'd like to get only the rows which have exactly the value 3 in that string.
Does anyone know how to solve this?
As others pointed out already, REGEXP calls a user defined function which must first be defined and loaded into the the database. Maybe some sqlite distributions or GUI tools include it by default, but my Ubuntu install did not. The solution was
sudo apt-get install sqlite3-pcre
which implements Perl regular expressions in a loadable module in /usr/lib/sqlite3/pcre.so
To be able to use it, you have to load it each time you open the database:
.load /usr/lib/sqlite3/pcre.so
Or you could put that line into your ~/.sqliterc.
Now you can query like this:
SELECT fld FROM tbl WHERE fld REGEXP '\b3\b';
If you want to query directly from the command-line, you can use the -cmd switch to load the library before your SQL:
sqlite3 "$filename" -cmd ".load /usr/lib/sqlite3/pcre.so" "SELECT fld FROM tbl WHERE fld REGEXP '\b3\b';"
If you are on Windows, I guess a similar .dll file should be available somewhere.
SQLite3 supports the REGEXP operator:
WHERE x REGEXP <regex>
http://www.sqlite.org/lang_expr.html#regexp
A hacky way to solve it without regex is where ',' || x || ',' like '%,3,%'
SQLite does not contain regular expression functionality by default.
It defines a REGEXP operator, but this will fail with an error message unless you or your framework define a user function called regexp(). How you do this will depend on your platform.
If you have a regexp() function defined, you can match an arbitrary integer from a comma-separated list like so:
... WHERE your_column REGEXP "\b" || your_integer || "\b";
But really, it looks like you would find things a whole lot easier if you normalised your database structure by replacing those groups within a single column with a separate row for each number in the comma-separated list. Then you could not only use the = operator instead of a regular expression, but also use more powerful relational tools like joins that SQL provides for you.
A SQLite UDF in PHP/PDO for the REGEXP keyword that mimics the behavior in MySQL:
$pdo->sqliteCreateFunction('regexp',
function ($pattern, $data, $delimiter = '~', $modifiers = 'isuS')
{
if (isset($pattern, $data) === true)
{
return (preg_match(sprintf('%1$s%2$s%1$s%3$s', $delimiter, $pattern, $modifiers), $data) > 0);
}
return null;
}
);
The u modifier is not implemented in MySQL, but I find it useful to have it by default. Examples:
SELECT * FROM "table" WHERE "name" REGEXP 'sql(ite)*';
SELECT * FROM "table" WHERE regexp('sql(ite)*', "name", '#', 's');
If either $data or $pattern is NULL, the result is NULL - just like in MySQL.
My solution in Python with sqlite3:
import sqlite3
import re
def match(expr, item):
return re.match(expr, item) is not None
conn = sqlite3.connect(':memory:')
conn.create_function("MATCHES", 2, match)
cursor = conn.cursor()
cursor.execute("SELECT MATCHES('^b', 'busy');")
print cursor.fetchone()[0]
cursor.close()
conn.close()
If regex matches, the output would be 1, otherwise 0.
With python, assuming con is the connection to SQLite, you can define the required UDF by writing:
con.create_function('regexp', 2, lambda x, y: 1 if re.search(x,y) else 0)
Here is a more complete example:
import re
import sqlite3
with sqlite3.connect(":memory:") as con:
con.create_function('regexp', 2, lambda x, y: 1 if re.search(x,y) else 0)
cursor = con.cursor()
# ...
cursor.execute("SELECT * from person WHERE surname REGEXP '^A' ")
I don't it is good to answer a question which was posted almost an year ago. But I am writing this for those who think that Sqlite itself provide the function REGEXP.
One basic requirement to invoke the function REGEXP in sqlite is
"You should create your own function in the application and then provide the callback link to the sqlite driver".
For that you have to use sqlite_create_function (C interface). You can find the detail from here and here
An exhaustive or'ed where clause can do it without string concatenation:
WHERE ( x == '3' OR
x LIKE '%,3' OR
x LIKE '3,%' OR
x LIKE '%,3,%');
Includes the four cases exact match, end of list, beginning of list, and mid list.
This is more verbose, doesn't require the regex extension.
UPDATE TableName
SET YourField = ''
WHERE YourField REGEXP 'YOUR REGEX'
And :
SELECT * from TableName
WHERE YourField REGEXP 'YOUR REGEX'
SQLite version 3.36.0 released 2021-06-18 now has the REGEXP command builtin.
For CLI build only.
Consider using this
WHERE x REGEXP '(^|,)(3)(,|$)'
This will match exactly 3 when x is in:
3
3,12,13
12,13,3
12,3,13
Other examples:
WHERE x REGEXP '(^|,)(3|13)(,|$)'
This will match on 3 or 13
You may consider also
WHERE x REGEXP '(^|\D{1})3(\D{1}|$)'
This will allow find number 3 in any string at any position
You could use a regular expression with REGEXP, but that is a silly way to do an exact match.
You should just say WHERE x = '3'.
If you are using php you can add any function to your sql statement by using: SQLite3::createFunction.
In PDO you can use PDO::sqliteCreateFunction and implement the preg_match function within your statement:
See how its done by Havalite (RegExp in SqLite using Php)
In case if someone looking non-regex condition for Android Sqlite, like this string [1,2,3,4,5] then don't forget to add bracket([]) same for other special characters like parenthesis({}) in #phyatt condition
WHERE ( x == '[3]' OR
x LIKE '%,3]' OR
x LIKE '[3,%' OR
x LIKE '%,3,%');
You can use the sqlean-regexp extension, which provides regexp search and replace functions.
Based on the PCRE2 engine, this extension supports all major regular expression features. It also supports Unicode. The extension is available for Windows, Linux, and macOS.
Some usage examples:
-- select messages containing number 3
select * from messages
where msg_text regexp '\b3\b';
-- count messages containing digits
select count(*) from messages
where msg_text regexp '\d+';
-- 42
select regexp_like('Meet me at 10:30', '\d+:\d+');
-- 1
select regexp_substr('Meet me at 10:30', '\d+:\d+');
-- 10:30
select regexp_replace('password = "123456"', '"[^"]+"', '***');
-- password = ***
In Julia, the model to follow can be illustrated as follows:
using SQLite
using DataFrames
db = SQLite.DB("<name>.db")
register(db, SQLite.regexp, nargs=2, name="regexp")
SQLite.Query(db, "SELECT * FROM test WHERE name REGEXP '^h';") |> DataFrame
for rails
db = ActiveRecord::Base.connection.raw_connection
db.create_function('regexp', 2) do |func, pattern, expression|
func.result = expression.to_s.match(Regexp.new(pattern.to_s, Regexp::IGNORECASE)) ? 1 : 0
end
I have a need to match cold leads against a database of our clients.
The leads come from a third party provider in bulk (thousands of records) and sales is asking us to (in their words) "filter out our clients" so they don't try to sell our service to a established client.
Obviously, there are misspellings in the leads. Charles becomes Charlie, Joseph becomes Joe, etc. So I can't really just do a filter comparing lead_first_name to client_first_name, etc.
I need to use some sort of string similarity mechanism.
Right now I'm using the lovely difflib to compare the leads' first and last names to a list generated with Client.objects.all(). It works, but because of the number of clients it tends to be slow.
I know that most sql databases have soundex and difference functions. See my test of it in the update below - it doesn't work as well as difflib.
Is there another solution? Is there a better solution?
Edit:
Soundex, at least in my db, doesn't behave as well as difflib.
Here is a simple test - look for "Joe Lopes" in a table containing "Joseph Lopes":
with temp (first_name, last_name) as (
select 'Joseph', 'Lopes'
union
select 'Joe', 'Satriani'
union
select 'CZ', 'Lopes'
union
select 'Blah', 'Lopes'
union
select 'Antonio', 'Lopes'
union
select 'Carlos', 'Lopes'
)
select first_name, last_name
from temp
where difference(first_name+' '+last_name, 'Joe Lopes') >= 3
order by difference(first_name+' '+last_name, 'Joe Lopes')
The above returns "Joe Satriani" as the only match. Even reducing the similarity threshold to 2 doesn't return "Joseph Lopes" as a potential match.
But difflib does a much better job:
difflib.get_close_matches('Joe Lopes', ['Joseph Lopes', 'Joe Satriani', 'CZ Lopes', 'Blah Lopes', 'Antonio Lopes', 'Carlos Lopes'])
['Joseph Lopes', 'CZ Lopes', 'Carlos Lopes']
Edit after gruszczy's response:
Before writing my own, I looked for and found a T-SQL implementation of Levenshtein Distance in the repository of all knowledge.
In testing it, it still won't do a better matching job than difflib.
Which led me to research what algorithm is behind difflib. It seems to be a modified version of the Ratcliff-Obershelp algorithm.
Unhappily I can't seem to find some other kind soul who has already created a T-SQL implementation based on difflib's... I'll try my hand at it when I can.
If nobody else comes up with a better answer in the next few days, I'll grant it to gruszczy. Thanks, kind sir.
soundex won't help you, because it's a phonetic algorithm. Joe and Joseph aren't similar phonetically, so soundex won't mark them as similar.
You can try Levenshtein distance, which is implemented in PostgreSQL. Maybe in your database too and if not, you should be able to write a stored procedure, which will calculate the distance between two strings and use it in your computation.
It's possible with trigram_similar lookups since Django 1.10, see docs for PostgreSQL specific lookups and Full text search
As per the answer of andilabs you can use the Levenshtein function to create your custom function. Postgres doc indicates that the Levenshtein function is as follows:
levenshtein(text source, text target, int ins_cost, int del_cost, int sub_cost) returns int
levenshtein(text source, text target) returns int
andilabs answer can use the only second function. If you want a more advanced search with insertion/deletion/substitution costs, you can rewrite function like this:
from django.db.models import Func
class Levenshtein(Func):
template = "%(function)s(%(expressions)s, '%(search_term)s', %(ins_cost)d, %(del_cost)d, %(sub_cost)d)"
function = 'levenshtein'
def __init__(self, expression, search_term, ins_cost=1, del_cost=1, sub_cost=1, **extras):
super(Levenshtein, self).__init__(
expression,
search_term=search_term,
ins_cost=ins_cost,
del_cost=del_cost,
sub_cost=sub_cost,
**extras
)
And call the function:
from django.db.models import F
Spot.objects.annotate(
lev_dist=Levenshtein(F('name'), 'Kfaka', 3, 3, 1) # ins = 3, del = 3, sub = 1
).filter(
lev_dist__lte=2
)
If you need getting there with django and postgres and don't want to use introduced in 1.10 trigram-similarity https://docs.djangoproject.com/en/2.0/ref/contrib/postgres/lookups/#trigram-similarity you can implement using Levensthein like these:
Extension needed fuzzystrmatch
you need adding postgres extension to your db in psql:
CREATE EXTENSION fuzzystrmatch;
Lets define custom function with wich we can annotate queryset. It just take one argument the search_term and uses postgres levenshtein function (see docs):
from django.db.models import Func
class Levenshtein(Func):
template = "%(function)s(%(expressions)s, '%(search_term)s')"
function = "levenshtein"
def __init__(self, expression, search_term, **extras):
super(Levenshtein, self).__init__(
expression,
search_term=search_term,
**extras
)
then in any other place in project we just import defined Levenshtein and F to pass the django field.
from django.db.models import F
Spot.objects.annotate(
lev_dist=Levenshtein(F('name'), 'Kfaka')
).filter(
lev_dist__lte=2
)