Handling invalid dates in Oracle

Handling invalid dates in Oracle - regex

I am writing simple SELECT queries which involve parsing out date from a string.
The dates are typed in by users manually in a web application and are recorded as string in database.
I am having CASE statement to handle various date formats and use correct format specifier accordingly in TO_DATE function.
However, sometimes, users enter something that's not a valid date(e.g. 13-31-2013) by mistake and then the entire query fails. Is there any way to handle such rougue records and replace them with some default date in query so that the entire query does not fail due to single invalid date record?
I have already tried regular expressions but they are not quite reliable when it comes to handling leap years and 30/31 days in months AFAIK.
I don't have privileges to store procedures or anything like that. Its just plain simple SELECT query executed from my application.

This is a client task..
The DB will give you an error for an invalid date (the DB does not have a "TO_DATE_AND_FIX_IF_NOT_CORRECT" function).
If you've got this error- it means you already tried to cast something to an invalid date.
I recommend doing the migration to date on your application server, and in the case of exception from your code - send a default date to the DB.
Also, that way you send to the DB an object of type DbDate and not a string.
That way you achieve two goals:
1. The dates will always be what you want them to be (from the client).
2. You close the door for SQL Injection attacks.
It sounds like in your case you should write the function I mentioned...
it should look something like that:
Create or replace function TO_DATE_SPECIAL(in_date in varchar2) return DATE is
ret_val date;
begin
ret_val := to_date(in_date,'MM-DD-YYYY');
return ret_val;
exception
when others then
return to_date('01-01-2000','MM-DD-YYYY');
end;
within the query - instead of using "to_date" use the new function.
that way instead of failing - it will give you back a default date.
-> There is not IsDate function .. so you'll have to create an object for it...
I hope you've got the idea and how to use it, if not - let me know.

I ended up using crazy regex that checks leap years, 30/31 days as well.
Here it is:
((^(0?[13578]|1[02])[\/.-]?(0?[1-9]|[12][0-9]|3[01])[\/.-]?(18|19|20){0,1}[0-9]{2}$)|(^(0?[469]|11)[\/.-]?(0?[1-9]|[12][0-9]|30)[\/.-]?(18|19|20){0,1}[0-9]{2}$)|(^([0]?2)[\/.-]?(0?[1-9]|1[0-9]|2[0-8])[\/.-]?(18|19|20){0,1}[0-9]{2}$)|(^([0]?2)[\/.-]?29[\/.-]?(((18|19|20){0,1}(04|08|[2468][048]|[13579][26]))|2000|00)$))
It is modified version of the answer by McKay here.
Not the most efficient but it works. I'll wait to see if I get a better alternative.

Related

Bigquery struct introspection

Is there a way to get the element types of a struct? For example something along the lines of:
SELECT #TYPE(structField.y)
SELECT #TYPE(structField)
...etc
Is that possible to do? The closest I can find is via the query editor and the web call it makes to validate a query:

As I mentioned already in comments - one of the option is to mimic same very Dry Run call with query built in such a way that it will fail with exact error message that will give you the info you are looking for. Obviously this assumes your use case can be implemented in whatever scripting language you prefer. Should be relatively easy to do.
Meantime, I was looking for making this within the SQL Query.
Below is the example of another option.
It is limited to below types, which might fit or not into your particular use case
object, array, string, number, boolean, null
So example is
select
s.birthdate, json_type(to_json(s.birthdate)),
s.country, json_type(to_json(s.country)),
s.age, json_type(to_json(s.age)),
s.weight, json_type(to_json(s.weight)),
s.is_this, json_type(to_json(s.is_this)),
from (
select struct(date '2022-01-01' as birthdate, 'UA' as country, 1 as age, 2.5 as weight, true as is_this) s
)
with output

You can try the below approach.
SELECT COLUMN_NAME, DATA_TYPE
FROM `your-project.your-dataset.INFORMATION_SCHEMA.COLUMNS`
WHERE TABLE_NAME = 'your-table-name'
AND COLUMN_NAME = 'your-struct-column-name'
ORDER BY ORDINAL_POSITION
You can check this documentation for more details using INFORMATION_SCHEMA for BigQuery.
Below is the screenshot of my testing.
DATA:
RESULT USING THE ABOVE SYNTAX:

With stored procedures, is cfSqlType necessary?

To protect against sql injection, I read in the introduction to ColdFusion that we are to use the cfqueryparam tag.
But when using stored procedures, I am passing my variables to corresponding variable declarations in SQL Server:
DROP PROC Usr.[Save]
GO
CREATE PROC Usr.[Save]
(#UsrID Int
,#UsrName varchar(max)
) AS
UPDATE Usr
SET UsrName = #UsrName
WHERE UsrID=#UsrID
exec Usr.[get] #UsrID
Q: Is there any value in including cfSqlType when I call a stored procedure?
Here's how I'm currently doing it in Lucee:
storedproc procedure='Usr.[Save]' {
procparam value=Val(form.UsrID);
procparam value=form.UsrName;
procresult name='Usr';
}

This question came up indirectly on another thread. That thread was about query parameters, but the same issues apply to procedures. To summarize, yes you should always type query and proc parameters. Paraphrasing the other answer:
Since cfsqltype is optional, its importance is often underestimated:
Validation:
ColdFusion uses the selected cfsqltype (date, number, etcetera) to validate the "value". This occurs before any sql is ever sent to
the database. So if the "value" is invalid, like "ABC" for type
cf_sql_integer, you do not waste a database call on sql that was never
going to work anyway. When you omit the cfsqltype, everything is
submitted as a string and you lose the extra validation.
Accuracy:
Using an incorrect type may cause CF to submit the wrong value to the database. Selecting the proper cfsqltype ensures you are
sending the correct value - and - sending it in a non-ambiguous format
the database will interpret the way you expect.
Again, technically you can omit the cfsqltype. However, that
means CF will send everything to the database as a string.
Consequently, the database will perform implicit conversion
(usually undesirable). With implicit conversion, the interpretation
of the strings is left entirely up to the database - and it might
not always come up with the answer you would expect.
Submitting dates as strings, rather than date objects, is a
prime example. How will your database interpret a date string like
"05/04/2014"? As April 5th or a May 4th? Well, it depends. Change the
database or the database settings and the result may be completely
different.
The only way to ensure consistent results is to specify the
appropriate cfsqltype. It should match the data type of the target
column/function (or at least an equivalent type).

Getting generatedauto-increment ID without second query (MySQL)

I have been searching for a while on how to get the generated auto-increment ID from an "INSERT . INTO ... (...) VALUES (...)". Even on stackoverflow, I only find the answer of using a "SELECT LAST_INSERT_ID()" in a subsequent query. I find this solution unsatisfactory for a number of reasons:
1) This will effectively double the queries sent to the database, especially since it is mostly handling inserts.
2) What will happen if more than one thread access the database at the same time? What if more than one application accesses the database at the same time? It seems to me the values are bound to become erroneous.
It's hard for me to believe that the MySQL C++ Connector wouldn't offer the feature that the Java Connector as well as the PHP Connector offer.

An example taken from http://forums.mysql.com/read.php?167,294960,295250
sql::Statement* stmt = conn->createStatement();
sql::ResultSet* res = stmt->executeQuery("SELECT ##identity AS id");
res->next();
my_ulong retVal = res->getInt64("id");
In nutshell, if your ID column is not an auto_increment column then you can as well use
SELECT ##identity AS id
EDIT:
Not sure what do you mean by second query/round trip. First I thought you are trying to know a different way to get the ID of the last inserted row but it looks like you are more interested in knowing whether you can save the round trip or not?
If that's the case, then I am completely agree with #WhozCraig; you can punch in both your queries in a single statement like inser into tab value ....;select last_inserted_id() which will be a single call
OR
you can have stored procedure like below to do the same and save the round trip
create procedure myproc
as
begin
insert into mytab values ...;
select last_inserted_id();
end
Let me know if this is not what you are trying to achieve.

Coldfusion 8 respond to particular error

I am writing an API which exposes parts of our database to a client. Part of this API requires certain HTML response codes to be sent for particular conditions. This is generally easy with simple checks, but I can not see how to catch (for example) 'InvalidDateTimeException' errors where an invalid date is submitted to SQL.
I have tried dumping the ERROR and cfcatch variables, but while they generate huge stack traces I cannot see any field that is easily parsable to check the specific type of error (short of doing a text search on the error message or stack trace).
I could also do a pre-check with regex such as
(\d{4})-(\d{2})-(\d{2}) (\d{2}):(\d{2}):(\d{2})
but this could still generate invalid dates. Coldfusion also provides some date validation, but I have read that it is particularly bad. This also wouldn't help other scenarios that don't deal with dates.
So in brief: What is the best way to react to a particular error such as 'InvalidDateTimeException' in coldfusion?
[Edit]
Some clarifications from the comments - We are using MYSQL 5 and cfqueryparams. We use the 'euro' date format here in Australia but it would be much prefered if the api user presented ISO format dates (yyyy-mm-dd) to avoid confusion.

Well .... my advice to use is to catch the error before it gets to SQL. You didn't specify your DBMS (SQL Server, MySQL, etc), so I'll focus on ColdFusion solutions. I hope one of these suggestions point you in the right path.
Options:
The article that you linked to concerning Coldfusion date validation mentions the isValid function as the recommended solution. Consider using that with the USDATE validation type, as suggested.
If you are using CFCs or at least cffunctions for your API methods, then you have cfargument type="date" at your disposal to assist with ensuring the dates are valid (although my feeling is that would have the same lenient behavior as isDate)
Inside of your cfquery tag, you should be using cfqueryparam for all of the parameters you pass, especially those passed directly from the user (whether a form post or a API call). You should use cfqueryparam cfsqltype=CF_SQL_DATE
Using any of the methods above (or all of them) you should wrap your coldfusion code in a try/catch construct and have a much easier error to deal with.
Depending on your DBMS, you might have access to Try/catch constructs there too.
**** UPDATED:
After reading your comment about the international conversion issues, I have two approaches that I'd choose between:
Keep in mind that I haven't tested any code or anything ....
First, maybe the international functions can help you.
http://livedocs.adobe.com/coldfusion/8/htmldocs/help.html?content=functions_in-k_37.html
Use Setlocale to set the location to English (Australian) and then use LSParseDateTime to read in the yyyy-mm-dd format and then use dateformat to write it to mySQL using mm/dd/yyyy or whatever dateformat it expects. I don't have much experience dealing with those LS functions though.
Second option, use the regex you provided to make sure that the input has the right structure, then use createDate to create a date in US format using the parsed mm dd and yyyy elements. Validate the usdate using isValid.
Here's a blindly coded attempt at the second option. Remember, I haven't tested this code. I'm heavily using the list function listGetAt to split the inputted datetime into separate date and time strings and then using listGetAt to parse out the individual date parts.
<cfscript>
isosampledate = "2013-06-05 14:07:33";
passesValidation = false;
expectedDatePattern = "\d{4})-(\d{2})-(\d{2}) (\d{2}):(\d{2}):(\d{2}";
try {
if (refind(expectedDatePattern,isosampledate)) {
datePortion = listGetAt(isosampledate,1," ");
timePortion = listGetAt(isosampledate,2," ");
yearPart = listGetAt(datePortion,1,"-");
monthPart = listGetAt(datePortion,2,"-");
dayPart = listGetAt(datePortion,3,"-");
hoursPart = listGetAt(timePortion,1,":");
minutesPart = listGetAt(timePortion,2,":");
secondsPart = listGetAt(timePortion,3,":");
thisUSDate = createDateTime(yearPart,monthPart,dayPart,hoursPart,minutesPart,secondsPart)
if (isValid("usdate",thisUSDate) {
passesValidation = true;
sqlDate = CreateODBCDateTime(thisUSDate);
}
}
} catch (e:any) {
passesValidation = false;
}
</cfscript>
I'm pretty sure that if the inputted value was not a valid date then at least one of those date functions would throw an exception which would get picked up by the catch block.
Hope this helps. I'm off to bed.

Getting a date value in a postgres table column and check if it's bigger than todays date

I have a Postgres table called clients. The name column contains certain values eg.
test23233 [987665432,2014-02-18]
At the end of the value is a date, I need to compare this date, and return all records where this specific date is younger than today
I tried
select id,name FROM clients where name ~ '(\d{4}\-\d{1,2}\-\d{1,2})';
but this isn't returning any values. How would I go about to achieve the results I want?

If the data is always stored this way (i.e. after the comma), I would not use a regex, but extract the date part and convert it to a proper date type.
SELECT *
FROM the_table
WHERE to_date(substring(name, strpos(name, ',') + 1, 10), 'yyyy-mm-dd') < current_date
You might want to put that to_date(...) thing into a view to make this easier for other queries.
In the long run you should realy (really) try to fix that data model.

Using a regular expression for this would be extremely hard. Is it possible to change the schema and data to separate the name, whatever the second value is, and the timestamp into separate columns? That would be far more logical, less error prone, and significantly faster.
Otherwise, I suspect you'll have to use some sort of parsing (possibly a regex) to extract the date, then convert it to a Postgres date, then compare that with the current time... for every single row. Ick.
EDIT: Actually, it's not quite that bad... because your dates are stored in a sort-friendly way, it's possible that you could do the extraction (whether with a regex or anything else) and just do an ordinal comparison with the string representation of today's date, without actually performing any date conversion for each row. It's still ugly though, and doesn't validate that the date isn't (say) 2011-99-99. If you can possibly store the data more sensibly, do.

I solved my issue by doing something similar to
select id,substring(name,'[0-9]{4}-[0-9]{1,2}-[0-9]{1,2}'),name FROM clients where substring(name,'[0-9]{4}-[0-9]{1,2}-[0-9]{1,2}') > '2011-03-18';
Might not be the best practice, but it works. But open to better suggestions

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js