How to make =NULL work in SQLite? - c++

Given the following table:
Table: Comedians
=================
Id First Middle Last
--- ------- -------- -------
1 Bob NULL Sagat
2 Jerry Kal Seinfeld
I want to make the following prepared query:
SELECT * FROM Comedians WHERE Middle=?
work for all cases. It currently does not work for the case where I pass NULL via sqlite3_bind_null. I realize that the query to actually search for NULL values uses IS NULL, but that would mean that I cannot use the prepared query for all cases. I would actually have to change the query depending on the input, which largely defeats the purpose of the prepared query. How do I do this? Thanks!

You can use the IS operator instead of =.
SELECT * FROM Comedians WHERE Middle IS ?

Nothing matches = NULL. The only way to check that is with IS NULL.
You can do a variety of things, but the straight forward one is...
WHERE
middle = ?
OR (middle IS NULL and ? IS NULL)
If there is a value you know NEVER appears, you can change that to...
WHERE
COALESCE(middle, '-') = COALESCE(?, '-')
But you need a value that literally NEVER appears. Also, it obfuscates the use of indexes, but the OR version can often suck as well (I don't know how well SQLite treats it).
All things equal, I recommend the first version.

NULL is not a value, but an attribute of a field. Instead use
SELECT * FROM Comedians WHERE Middle IS NULL

If you want match everything on NULL
SELECT * FROM Comedians WHERE Middle=IfNull(?, Middle)
if want match none on NULL
SELECT * FROM Comedians WHERE Middle=IfNull(?, 'DUMMY'+Middle)
See this answer: https://stackoverflow.com/a/799406/30225

Related

Only one IF ELSE statement working SAS

Can someone explain to me why only the first IF ELSE statement in my code works? I am trying to combine multiple variables into one.
DATA BCMasterSet2;
SET BCMasterSet;
drop PositiveLymphNodes1;
if PositiveLymphNodes1 = "." then PositiveLymphNodes =
put(PositiveLymphNodes2, 2.);
else PositiveLymphNodes = PositiveLymphNodes1;
if PositiveLymphNodes2 = "." then new_posLymph = put(PositiveLymphNodes,
2.);
else new_posLymph = PositiveLymphNodes2;
RUN;
Here is a nice screenshot of what the incorrect output looks like:OUTPUT
Thanks!
Hard to say without seeing all of your data, but I have a suspicion: is positivelymphnodes1 character or numeric? Is it ever actually equal to "."?
If you are trying to say "if PositiveLymphNodes1 is missing", then you can say that this way:
if missing(positivelymphnodes1) then ...
You can also do the same thing using coalesce or coalescec (the latter is character, the former numeric, in its return value). It chooses the first nonmissing argument. - so if the first argument is missing, it chooses the second.
positiveLymphNodes = coalescec(PositiveLymphNodes1, put(positiveLymphNodes2,2.));
new_posLymph = coalescec(positiveLymphNodes2, put(positiveLymphNodes,2.));
I would be curious why you're using put only in one place and not the other - use it in both or neither, I would suggest.

PL/SQL optimize searching a date in varchar

I have a table, that contains date field (let it be date s_date) and description field (varchar2(n) desc). What I need is to write a script (or a single query, if possible), that will parse the desc field and if it contains a valid oracle date, then it will cut this date and update the s_date, if it is null.
But there are one more condition - there are must be exactly one occurence of a date in the desc. If there are 0 or >1 - nothing should be updated.
By the time I came up with this pretty ugly solution using regular expressions:
----------------------------------------------
create or replace function to_date_single( p_date_str in varchar2 )
return date
is
l_date date;
pRegEx varchar(150);
pResStr varchar(150);
begin
pRegEx := '((0[1-9]|[12][0-9]|3[01])[.](0[1-9]|1[012])[.](19|20)\d\d)((.|\n|\t|\s)*((0[1-9]|[12][0-9]|3[01])[.](0[1-9]|1[012])[.](19|20)\d\d))?';
pResStr := regexp_substr(p_date_str, pRegEx);
if not (length(pResStr) = 10)
then return null;
end if;
l_date := to_date(pResStr, 'dd.mm.yyyy');
return l_date;
exception
when others then return null;
end to_date_single;
----------------------------------------------
update myTable t
set t.s_date = to_date_single(t.desc)
where t.s_date is null;
----------------------------------------------
But it's working extremely slow (more than a second for each record and i need to update about 30000 records). Is it possible to optimize the function somehow? Maybe it is the way to do the thing without regexp? Any other ideas?
Any advice is appreciated :)
EDIT:
OK, maybe it'll be useful for someone. The following regular expression performs check for valid date (DD.MM.YYYY) taking into account the number of days in a month, including the check for leap year:
(((0[1-9]|[12]\d|3[01])\.(0[13578]|1[02])\.((19|[2-9]\d)\d{2}))|((0[1-9]|[12]\d|30)\.(0[13456789]|1[012])\.((19|[2-9]\d)\d{2}))|((0[1-9]|1\d|2[0-8])\.02\.((19|[2-9]\d)\d{2}))|(29\.02\.((1[6-9]|[2-9]\d)(0[48]|[2468][048]|[13579][26])|((16|[2468][048]|[3579][26])00))))
I used it with the query, suggested by #David (see accepted answer), but I've tried select instead of update (so it's 1 regexp less per row, because we don't do regexp_substr) just for "benchmarking" purpose.
Numbers probably won't tell much here, cause it all depends on hardware, software and specific DB design, but it took about 2 minutes to select 36K records for me. Update will be slower, but I think It'll still be a reasonable time.
I would refactor it along the lines of a single update query.
Use two regexp_instr() calls in the where clause to find rows for which a first occurrence of the match occurs and a second occurrence does not, and regexp_substr() to pull the matching characters for the update.
update my_table
set my_date = to_date(regexp_subtr(desc,...),...)
where regexp_instr(desc,pattern,1,1) > 0 and
regexp_instr(desc,pattern,1,2) = 0
You might get even better performance with:
update my_table
set my_date = to_date(regexp_subtr(desc,...),...)
where case regexp_instr(desc,pattern,1,1)
when 0 then 'N'
else case regexp_instr(desc,pattern,1,2)
when 0 then 'Y'
else 'N'
end
end = 'Y'
... as it only evaluates the second regexp if the first is non-zero. The first query might also do that but the optimiser might choose to evaluate the second predicate first because it is an equality condition, under the assumption that it's more selective.
Or reordering the Case expression might be better -- it's a trade-off that's difficult to judge and probably very dependent on the data.
I think there's no way to improve this task. Actually, in order to achieve what you want it should get even slower.
Your regular expression matches text like 31.02.2013, 31.04.2013 outside the range of the month. If you put year in the game,
it gets even worse. 29.02.2012 is valid, but 29.02.2013 is not.
That's why you have to test if the result is a valid date.
Since there isn't a full regular expression for that, you would have to do it by PLSQL really.
In your to_date_single function you return null when a invalid date is found.
But that doesn't mean there won't be other valid dates forward on the text.
So you have to keep trying until you either find two valid dates or hit the end of the text:
create or replace function fn_to_date(p_date_str in varchar2) return date is
l_date date;
pRegEx varchar(150);
pResStr varchar(150);
vn_findings number;
vn_loop number;
begin
vn_findings := 0;
vn_loop := 1;
pRegEx := '((0[1-9]|[12][0-9]|3[01])[.](0[1-9]|1[012])[.](19|20)\d\d)';
loop
pResStr := regexp_substr(p_date_str, pRegEx, 1, vn_loop);
if pResStr is null then exit; end if;
begin
l_date := to_date(pResStr, 'dd.mm.yyyy');
vn_findings := vn_findings + 1;
-- your crazy requirement :)
if vn_findings = 2 then
return null;
end if;
exception when others then
null;
end;
-- you have to keep trying :)
vn_loop := vn_loop + 1;
end loop;
return l_date;
end;
Some tests:
select fn_to_date('xxxx29.02.2012xxxxx') c1 --ok
, fn_to_date('xxxx29.02.2012xxx29.02.2013xxx') c2 --ok, 2nd is invalid
, fn_to_date('xxxx29.02.2012xxx29.02.2016xxx') c2 --null, both are valid
from dual
As you are going to have to do try and error anyway one idea would be to use a simpler regular expression.
Something like \d\d[.]\d\d[.]\d\d\d\d would suffice. That would depend on your data, of course.
Using #David's idea you could filter the ammount of rows to apply your to_date_single function (because it's slow),
but regular expressions alone won't do what you want:
update my_table
set my_date = fn_to_date( )
where regexp_instr(desc,patern,1,1) > 0

How to Select Date Range using RegEx

I have date strings that looks like so:
20120817110329
Which, as you can see, is formatted: YYYYMMDDHHMMSS
How would I select (using RegEx) dates that are between 7/15 and 8/20? Or what about 8/1 to 8/15?
I have this working if I want to select a range that doesn't involve more than one place, but it is very limited:
^2012081[0-7] //selects 8/10 to 8/17
Update
Never forget the obvious (as pointed out by Wiseguy below), one can simply look for a range between 201207150000 and 201208209999.
Since you're just querying a database field that contains these values, you could simply check for a value between 201207150000 and 201208209999.
If you still want the regex, it ain't pretty, but this does it:
^20120(7(1[5-9]|2\d|3[01])|8([0-1]\d|20))\d{4}$
reFiddle example
You basically have to account for each possible range by hand.
^20120
(
7
(
1[5-9]
|2\d
|3[01]
)
|
8
(
[0-1]\d
|20
)
)
\d{4}$
I think this should work:
^2012(07(1[5-9]|[2-3][0-9])|08([0-1][0-9]|20))
Although the other answers are pretty the same...
You can check this for more info.

SQL and regular expression to check if string is a substring of larger string?

I have a database filled with some codes like
EE789323
990
78000
These numbers are ALWAYS endings of a larger code. Now I have a function that needs to check if the larger code contains the subcode.
So if I have codes 90 and 990 and my full code is EX888990, it should match both of them.
However I need to do it in the following way:
SELECT * FROM tableWithRecordsWithSubcode
WHERE subcode MATCHES [reg exp with full code];
Is a regular expression like this this even possible?
EDIT:
To clarify the issue I'm having, I'm not using SQL here. I just used that to give an example of the type of query I'm using.
In fact I'm using iOS with CoreData, and I need a predicate to fetch me only the records that match.
In the way that is mentioned below.
Given the observations from a comment:
Do you have two tables, one called tableWithRecordsWithSubcode and another that might be tableWithFullCodeColumn? So the matching condition is in part a join - you need to know which subcodes match any of the full codes in the second table? But you're only interested in the information in the tableWithRecordsWithSubcode table, not in which rows it matches in the other table?
and the laconic "you're correct" response, then we have to rewrite the query somewhat.
SELECT DISTINCT S.*
FROM tableWithRecordsWithSubcode AS S
JOIN tableWithFullCodeColumn AS F
ON F.Fullcode ...ends-with... S.Subcode
or maybe using an EXISTS sub-query:
SELECT S.*
FROM tableWithRecordsWithSubcode AS S
WHERE EXISTS(SELECT * FROM tableWithFullCodeColumn AS F
WHERE F.Fullcode ...ends-with... S.Subcode)
This uses a correlated sub-query but avoids the DISTINCT operation; it may mean the optimizer can work more efficiently.
That just leaves the magical 'X ...ends-with... T' operator to be defined. One possible way to do that is with LENGTH and SUBSTR. However, SUBSTR does not behave the same way in all DBMS, so you may have to tinker with this (possibly adding a third argument, LENGTH(s.subcode)):
LENGTH(f.fullcode) >= LENGTH(s.subcode) AND
SUBSTR(f.fullcode, LENGTH(f.fullcode) - LENGTH(s.subcode)) = s.subcode
This leads to two possible formulations:
SELECT DISTINCT S.*
FROM tableWithRecordsWithSubcode AS S
JOIN tableWithFullCodeColumn AS F
ON LENGTH(F.Fullcode) >= LENGTH(S.Subcode)
AND SUBSTR(F.Fullcode, LENGTH(F.Fullcode) - LENGTH(S.Subcode)) = S.Subcode;
and
SELECT S.*
FROM tableWithRecordsWithSubcode AS S
WHERE EXISTS(
SELECT * FROM tableWithFullCodeColumn AS F
WHERE LENGTH(F.Fullcode) >= LENGTH(S.Subcode)
AND SUBSTR(F.Fullcode, LENGTH(F.Fullcode) - LENGTH(S.Subcode)) = S.Subcode);
This is not going to be a fast operation; joins on computed results such as required by this query seldom are.
I'm not sure why you think that you need a regular expression... Just use the charindex function:
select something
from table
where charindex(code, subcode) <> 0
Edit:
To find strings at the end, you can create a pattern with the % wildcard from the subcode:
select something
from table
where '%' + subcode like code

doctrine2 dql, use setParameter with % wildcard when doing a like comparison

I want to use the parameter place holder - e.g. ?1 - with the % wild cards. that is, something like: "u.name LIKE %?1%" (though this throws an error). The docs have the following two examples:
1.
// Example - $qb->expr()->like('u.firstname', $qb->expr()->literal('Gui%'))
public function like($x, $y); // Returns Expr\Comparison instance
I do not like this as there is no protection against code injection.
2.
// $qb instanceof QueryBuilder
// example8: QueryBuilder port of: "SELECT u FROM User u WHERE u.id = ?1 OR u.nickname LIKE ?2 ORDER BY u.surname DESC" using QueryBuilder helper methods
$qb->select(array('u')) // string 'u' is converted to array internally
->from('User', 'u')
->where($qb->expr()->orx(
$qb->expr()->eq('u.id', '?1'),
$qb->expr()->like('u.nickname', '?2')
))
->orderBy('u.surname', 'ASC'));
I do not like this because I need to search for terms within the object's properties - that is, I need the wild cards on either side.
When binding parameters to queries, DQL pretty much works exactly like PDO (which is what Doctrine2 uses under the hood).
So when using the LIKE statement, PDO treats both the keyword and the % wildcards as a single token. You cannot add the wildcards next to the placeholder. You must append them to the string when you bind the params.
$qb->expr()->like('u.nickname', '?2')
$qb->getQuery()->setParameter(2, '%' . $value . '%');
See this comment in the PHP manual.
The selected answer is wrong. It works, but it is not secure.
You should escape the term that you insert between the percentage signs:
->setParameter(2, '%'.addcslashes($value, '%_').'%')
The percentage sign '%' and the symbol underscore '_' are interpreted as wildcards by LIKE. If they're not escaped properly, an attacker might construct arbirtarily complex queries that can cause a denial of service attack. Also, it might be possible for the attacker to get search results he is not supposed to get. A more detailed description of attack scenarios can be found here: https://stackoverflow.com/a/7893670/623685