How to see 'full' SQL Error Messages in BigQuery? - google-cloud-platform

I am writing a large MERGE statement in BigQuery.
When I attempt to run this query the validator gives me an error involving a lot of ...'s that hides the useful information as shown below:
Value has type ARRAY<STRUCT<eventName STRING, eventUUID STRING, eventDate DATE, ...>> which cannot be inserted into column Events, which has type ARRAY<STRUCT<eventName STRING, eventUUID STRING, eventDate DATE, ...>> at [535:1]
I am extremely confident these two array objects match exactly, however since I am struggling to get around this I would love to see the full error message.
Is there any way to see the full error?
I have looked into the Google Logging tool and cannot see any additional information.
I have also tried the following Cloud Shell command:
bq --format=prettyjson show -j [Job Id Goes Here]
Again, this seems to provide no additional information.

This approach feels pretty silly but it could be the last resort for really long nest type.
Use INFORMATION_SCHEMA.COLUMNS to get a full string of the target type, in your case, type of column Events.
Use CREATE TABLE <yourDataset>.<yourTempTable> AS SELECT ... to dump one row of the Value into a table. Use 1) again to see its full type string.

Related

Bigquery struct introspection

Is there a way to get the element types of a struct? For example something along the lines of:
SELECT #TYPE(structField.y)
SELECT #TYPE(structField)
...etc
Is that possible to do? The closest I can find is via the query editor and the web call it makes to validate a query:
As I mentioned already in comments - one of the option is to mimic same very Dry Run call with query built in such a way that it will fail with exact error message that will give you the info you are looking for. Obviously this assumes your use case can be implemented in whatever scripting language you prefer. Should be relatively easy to do.
Meantime, I was looking for making this within the SQL Query.
Below is the example of another option.
It is limited to below types, which might fit or not into your particular use case
object, array, string, number, boolean, null
So example is
select
s.birthdate, json_type(to_json(s.birthdate)),
s.country, json_type(to_json(s.country)),
s.age, json_type(to_json(s.age)),
s.weight, json_type(to_json(s.weight)),
s.is_this, json_type(to_json(s.is_this)),
from (
select struct(date '2022-01-01' as birthdate, 'UA' as country, 1 as age, 2.5 as weight, true as is_this) s
)
with output
You can try the below approach.
SELECT COLUMN_NAME, DATA_TYPE
FROM `your-project.your-dataset.INFORMATION_SCHEMA.COLUMNS`
WHERE TABLE_NAME = 'your-table-name'
AND COLUMN_NAME = 'your-struct-column-name'
ORDER BY ORDINAL_POSITION
You can check this documentation for more details using INFORMATION_SCHEMA for BigQuery.
Below is the screenshot of my testing.
DATA:
RESULT USING THE ABOVE SYNTAX:

Google Cloud DataPrep DATEDIF function inconsistent

I have four DateTime columns, all in long format eg 2016-08-01T21:13:02Z. They are called EnqDateTime, QuoteCreatedDateTime, BookingCreatedDateTime and RejAt.
I want to add columns for the duration (in days) between EnquiryDateTime and the other three columns, i.e.
DATEDIF(EnqDateTime, QuoteCreatedDateTime, day)
This works for RejAt, but throws an error for all the other columns:
Parameter "rhs" accepts only ["Datetime"]
As per the image below, all four columns are DateTime.
Can anyone see any other reason this may not be working for 2 of the three columns?
As you can see in the image below, I reproduced an scenario such as the one you presented here, and I had no issue with it. I create the three columns X2Y using the same formula that you shared:
DATEDIF(EnqDateTime, QuoteCreatedDateTime, day)
DATEDIF(EnqDateTime, BookingCreatedDateTime, day)
DATEDIF(EnqDateTime, RejAt, day)
My guessing is that, for some reason, the columns do not have an appropriate Datetime format. Maybe you can try applying some transformations to the data in order to make sure that the data contained in the columns has the appropriate format. I recommend that you try doing the following:
Clean all missing values, clicking on the column and then Clean > Missing > Fill with NULL. Missing values can prevent Dataprep from recognizing a data type properly.
Change the data type again to Datetime, just to doublecheck that there is not any field that does not have the Datetime type. You can do so by clicking on the column and then Change type > Date/Time.
If these methods do not solve your issue, maybe you can try working with a minimal example, having only a few rows, so that you can narrow down the variables with which to work. Then you can update your question with more information.
It would also be nice to know where are you getting the error Parameter "rhs" accepts only ["Datetime"]. It is not clear for me what the rhs (Right Hand Side) parameter is in this case, so maybe you can also provide more details about that.

Amazon Athena: no viable alternative at input

While creating a table in Athena; it gives me following exception:
no viable alternative at input
hyphens are not allowed in table name.. ( though wizard allows it ) .. Just remove hyphen and it works like a charm
Unfortunately, at the moment the syntax validation error messages are not very descriptive in Athena, this error may mean "almost" any possible syntax errors on the create table statement.
Although this is annoying at the moment you will need to check if the syntax follows the Create table documentation
Some examples are:
Backticks not in place (as already pointed out)
Missing/extra commas (remember that the last column doesn't need the comma after column definition
Missing spaces
More ..
This error generally occurs when the syntax of DDL has some silly errors.There are several answers that explain different errors based on there state.The simple solution to this problem is to patiently look into DDL and verify following points line by line:-
Check for missing commas
Unbalanced `(backtick operator)
Incompatible datatype not supported by HIVE(HIVE DATA TYPES REFERENCE)
Unbalanced comma
Hypen in table name
In my case, it was because of a trailing comma after the last column in the table. For example:
CREATE EXTERNAL TABLE IF NOT EXISTS my_table (
one STRING,
two STRING,
) LOCATION 's3://my-bucket/some/path';
After I removed the comma at the end of two STRING, it worked fine.
My case: it was an external table and the location had a typo (hence didn't exist)
Couple of tips:
Click the "Format query" button so you can spot errors easily
Use the example at the bottom of the documentation - it works - and modify it with your parameters: https://docs.aws.amazon.com/athena/latest/ug/create-table.html
Slashes. Mine was slashes. I had the DDL from Athena, saved as a python string.
WITH SERDEPROPERTIES (
'escapeChar'='\\',
'quoteChar'='\"',
'separatorChar'=',')
was changed to
WITH SERDEPROPERTIES (
'escapeChar'='\',
'quoteChar'='"',
'separatorChar'=',')
And everything fell apart.
Had to make it:
WITH SERDEPROPERTIES (
'escapeChar'='\\\\',
'quoteChar'='\\\"',
'separatorChar'=',')
In my case, it was an extra comma in PARTITIONED BY section,
In my case, I was missing the singlequotes for the S3 URL
In my case, it was that one of the table column names was enclosed in single quotes, as per the AWS documentation :( ('bucket')
As other users have noted, the standard syntax validation error message that Athena provides is not particularly helpful. Thoroughly checking the required DDL syntax (see HIVE data types reference) that other users have mentioned can be pretty tedious since it is fairly extensive.
So, an additional troubleshooting trick is to let AWS's own data parsing engine (AWS Glue) give you a hint about where your DDL may be off. The idea here is to let AWS Glue parse the data using its own internal rules and then show you where you may have made your mistake.
Specifically, here are the steps that worked for me to troubleshoot my DDL statement, which was giving me lots of trouble:
create a data crawler in AWS Glue; AWS and lots of other places go through the very detailed steps this requires so I won't repeat it here
point the crawler to the same data that you wanted (but failed) to upload into Athena
set the crawler output to a table (in an Athena database you've already created)
run the crawler and wait for the table with populated data to be created
find the newly-created table in the Athena Query Editor tab, click on the three vertical dots (...), and select "Generate Create Table DLL":
this will make Athena create the DLL for this table that is guaranteed to be valid (since the table was already created using that DLL)
take a look at this DLL and see if/where/how it differs from the DLL that you originally wrote. Naturally, this automatically-generated DLL will not have the exact choices for the data types that you may find useful, but at least you will know that it is 100% valid
finally, update your DLL based on this new Glue/Athena-generated-DLL, adjusting the column/field names and data types for your particular use case
After searching and following all the good answers here.
My issue was that working in Node.js i needed to remove the optional
ESCAPED BY '\' used in the Row settings to get my query to work. Hope this helps others.
Something that wasn't obvious for me the first time I used the UI is that if you get an error in the create table 'wizard', you can then cancel and there should be the query used that failed written in a new query window, for you to edit and fix.
My database had a hypen, so I added backticks in the query and rerun it.
This happened to me due to having comments in the query.
I realized this was a possibility when I tried the "Format Query" button and it turned the entire thing into almost 1 line, mostly commented out. My guess is that the query parser runs this formatter before sending the query to Athena.
Removed the comments, ran the query, and an angel got its wings!

Handling invalid dates in Oracle

I am writing simple SELECT queries which involve parsing out date from a string.
The dates are typed in by users manually in a web application and are recorded as string in database.
I am having CASE statement to handle various date formats and use correct format specifier accordingly in TO_DATE function.
However, sometimes, users enter something that's not a valid date(e.g. 13-31-2013) by mistake and then the entire query fails. Is there any way to handle such rougue records and replace them with some default date in query so that the entire query does not fail due to single invalid date record?
I have already tried regular expressions but they are not quite reliable when it comes to handling leap years and 30/31 days in months AFAIK.
I don't have privileges to store procedures or anything like that. Its just plain simple SELECT query executed from my application.
This is a client task..
The DB will give you an error for an invalid date (the DB does not have a "TO_DATE_AND_FIX_IF_NOT_CORRECT" function).
If you've got this error- it means you already tried to cast something to an invalid date.
I recommend doing the migration to date on your application server, and in the case of exception from your code - send a default date to the DB.
Also, that way you send to the DB an object of type DbDate and not a string.
That way you achieve two goals:
1. The dates will always be what you want them to be (from the client).
2. You close the door for SQL Injection attacks.
It sounds like in your case you should write the function I mentioned...
it should look something like that:
Create or replace function TO_DATE_SPECIAL(in_date in varchar2) return DATE is
ret_val date;
begin
ret_val := to_date(in_date,'MM-DD-YYYY');
return ret_val;
exception
when others then
return to_date('01-01-2000','MM-DD-YYYY');
end;
within the query - instead of using "to_date" use the new function.
that way instead of failing - it will give you back a default date.
-> There is not IsDate function .. so you'll have to create an object for it...
I hope you've got the idea and how to use it, if not - let me know.
I ended up using crazy regex that checks leap years, 30/31 days as well.
Here it is:
((^(0?[13578]|1[02])[\/.-]?(0?[1-9]|[12][0-9]|3[01])[\/.-]?(18|19|20){0,1}[0-9]{2}$)|(^(0?[469]|11)[\/.-]?(0?[1-9]|[12][0-9]|30)[\/.-]?(18|19|20){0,1}[0-9]{2}$)|(^([0]?2)[\/.-]?(0?[1-9]|1[0-9]|2[0-8])[\/.-]?(18|19|20){0,1}[0-9]{2}$)|(^([0]?2)[\/.-]?29[\/.-]?(((18|19|20){0,1}(04|08|[2468][048]|[13579][26]))|2000|00)$))
It is modified version of the answer by McKay here.
Not the most efficient but it works. I'll wait to see if I get a better alternative.

How to get constraint errors from OCIErrorGet?

Our C++ program is using Oracle and OCI to do its database work. Occasionally, the user will trigger a constraint violation, which we detect and then show an error message from OCIErrorGet. OCIErrorGet returns strings like this:
ORA-02292: integrity constraint (MYSCHEMA.CC_MYCONSTRAINT) violated - child record found
ORA-06512: at line 5
I am looking for the cleanest way to extract "MYSCHEMA.CC_MYCONSTRAINT" from the Oracle error. Knowing the name of the constraint, I could show a better error message (our code could look up a very meaningful error message if it had access to the constraint name).
I could use a regex or something and assume that the Oracle message will never change, but this seems a little fragile to me. Or I could look for specific ORA codes and then grab whatever text falls between the parentheses. But I was hoping OCI had a cleaner/more robust way, if a constraint fails, to figure out the actual name of the failed constraint without resorting to hardcoded string manipulation.
Any ideas?
According to the Oracle Docs, a string search is exactly what you need to do:
Recognizing Variable Text in Messages
To help you find and fix errors, Oracle embeds object names, numbers,
and character strings in some messages. These embedded variables are
represented by string, number, or character, as appropriate. For
example:
ORA-00020: maximum number of processes (number) exceeded
The preceding message might actually appear as follows:
ORA-00020: maximum number of processes (50) exceeded
Oracle makes a big point in their docs of saying the strings will be kept up to date in their section on "Message Accuracy." It's a pretty strong suggestion that they intend you to do a string search.
Also, according to this website, the Oracle Error structure also pretty strongly implies that they intend you to do a string search, because the data structure lacks anything else for you to get:
array(4) {
["code"]=>int(942)
["message"]=>string(40) "ORA-00942: table or view does not exist"
["offset"]=>int(14)
["sqltext"]=>string(32) "select * from non_existing_table"
}
This output reveals the following information:
The variable $erris an array with four elements.
The first element is accessible by the key ‘code’ and its value is number 942.
The second value is accessible by the key ‘message’ and the value is string “ORA-00942: table or view does not exist”.
The third value is accessible by the key ‘offset’, and its value is the number 14. This is the character before the name of the
non-existing table.
The fourth member is the problematic SQL message causing the error in the first place.
I agree with you; it would be great if there were a better way to get the constraint name you're violating, but string-matching seems to be the intended way.