Querying a mixed type column in Amazon Athena - amazon-web-services

I have an Athena table that has a column in it that I would like to query. The type of the column is double, but it contains data of mixed types. The data is either:
A double (0-1 inclusive)
An array with 0 or 1 elements (again, a double 0-1 inclusive).
I have no idea how the column go into this state. I'm just trying to fix it.
If I do a naive query:
SELECT col FROM tbl.db;
I get the error: "HIVE_BAD_DATA: Error parsing field value '[]' for field 0: org.openx.data.jsonserde.json.JSONArray cannot be cast to java.lang.Double"
Some things that I've tried, but don't work:
Use try_cast
The docs on try_cast make it sound like the perfect solution; reality is not so kind.
When I tried to run
SELECT COALESCE(
try_cast(col AS double),
try_cast(col AS array<double>)) FROM tbl.db;
I get the error: "SYNTAX_ERROR: line 3:5: Cannot cast double to array(double)". Indeed, when I try more simple examples, I continue to get an error: both
SELECT try_cast(3.4 AS array<double>);
SELECT try_cast(ARRAY [3.4] AS double);
trigger errors. It appears that, although the docs claim that a cast error would cause the function to return null, perhaps that only works when casting between primitive data types.
Cast to JSON
While casting both doubles and arrays to JSON works fine as in these examples:
SELECT try_cast(3.4 AS JSON);
SELECT try_cast(ARRAY [3.4] AS JSON);
when I perform the cast on the actual column like so:
SELECT try_cast(col AS JSON) FROM tbl.db;
I get the error: "HIVE_BAD_DATA: Error parsing field value '["0.01"]' for field 0: org.openx.data.jsonserde.json.JSONArray cannot be cast to java.lang.Double"
I'd really like to be able to query this data. Alternatively, if it's possible to migrate it into a state where it's all one type, that would be an acceptable solution as well.

Related

Is there anyway to cast mixed types while loading in BQ?

I have a field in BQ table that defined as string, let's call that field 'foo'.
I am running a job that loads data from JSON to my BQ table.
The problem is, my 'foo' field in the JSON can be either number or string(as form "N/A").
I thought while loading JSON file, BQ was smart enough to cast number values for that fields into string. For example, "foo": 48 would cast into "foo": "48". But seems like it doesn't do it by default. Is there any way to configure LoadJobConfig(part of BQ python SDK) to accomplishing this task?
It doesn't do it by default, but if you want to cast foo field value from a string to a number when possible and not fail the conversion on invalid values.
you could use
SAFE_CAST(foo AS INT64)
see example below
SELECT
foo,
SAFE_CAST(foo AS INT64) AS converted_foo
FROM
UNNEST(["60", "234", "N/A"]) AS foo

How to see 'full' SQL Error Messages in BigQuery?

I am writing a large MERGE statement in BigQuery.
When I attempt to run this query the validator gives me an error involving a lot of ...'s that hides the useful information as shown below:
Value has type ARRAY<STRUCT<eventName STRING, eventUUID STRING, eventDate DATE, ...>> which cannot be inserted into column Events, which has type ARRAY<STRUCT<eventName STRING, eventUUID STRING, eventDate DATE, ...>> at [535:1]
I am extremely confident these two array objects match exactly, however since I am struggling to get around this I would love to see the full error message.
Is there any way to see the full error?
I have looked into the Google Logging tool and cannot see any additional information.
I have also tried the following Cloud Shell command:
bq --format=prettyjson show -j [Job Id Goes Here]
Again, this seems to provide no additional information.
This approach feels pretty silly but it could be the last resort for really long nest type.
Use INFORMATION_SCHEMA.COLUMNS to get a full string of the target type, in your case, type of column Events.
Use CREATE TABLE <yourDataset>.<yourTempTable> AS SELECT ... to dump one row of the Value into a table. Use 1) again to see its full type string.

Create.io Order of column when creating a table

I have CrateDb version 3.2.7 running under Windows Server 2012. I create a table like this:
create table test3 (firstcolumn bigint primary key, secondcolumn int, thirdcolumn timestamp, fourthcolumn double, fifthcolumn double, sixtcolumn smallint, seventhcolumn double, heightcolumn int, ninthcolumn smallint, tenthcolumn smallint) clustered into 12 shards with(number_of_replicas = 0, refresh_interval =0);
So I'm expecting the firstcolumn to be the first, and so on. But after the creation, when I do a SELECT * FROM test3, I get the following result:
It seems that the first column returned is the "fifth" Looks like columns are returned in alphabetical order.
Does it means that CrateDB created the columns in that order? Does it keeps the order somewhere? If columns are in alphabetical order, does that mean that if I want to COPY data from another dbms to CrateDB, then I have to export data based on alphabetical order?
For insert not necessarily, only if they are omitted do they have to be in an alphabetical order see here. Order doesn't seem to be "kept" anywhere per se.
COPY FROM is a different kind of import tactic and not quite what the good old INSERT would do. I would suggest writing a command line app to import data into cratedb. COPY FROM doesn't do any type checking, nor does it cast types and will always import the data as it was in the source file (see here). From your other question I see you may have gps related data (?) you will need to manually map them to a GEO_POINT type just as 1 example.
Crate offers good performance (whatever that means to you or me) with bulk endpoint

Google Cloud DataPrep DATEDIF function inconsistent

I have four DateTime columns, all in long format eg 2016-08-01T21:13:02Z. They are called EnqDateTime, QuoteCreatedDateTime, BookingCreatedDateTime and RejAt.
I want to add columns for the duration (in days) between EnquiryDateTime and the other three columns, i.e.
DATEDIF(EnqDateTime, QuoteCreatedDateTime, day)
This works for RejAt, but throws an error for all the other columns:
Parameter "rhs" accepts only ["Datetime"]
As per the image below, all four columns are DateTime.
Can anyone see any other reason this may not be working for 2 of the three columns?
As you can see in the image below, I reproduced an scenario such as the one you presented here, and I had no issue with it. I create the three columns X2Y using the same formula that you shared:
DATEDIF(EnqDateTime, QuoteCreatedDateTime, day)
DATEDIF(EnqDateTime, BookingCreatedDateTime, day)
DATEDIF(EnqDateTime, RejAt, day)
My guessing is that, for some reason, the columns do not have an appropriate Datetime format. Maybe you can try applying some transformations to the data in order to make sure that the data contained in the columns has the appropriate format. I recommend that you try doing the following:
Clean all missing values, clicking on the column and then Clean > Missing > Fill with NULL. Missing values can prevent Dataprep from recognizing a data type properly.
Change the data type again to Datetime, just to doublecheck that there is not any field that does not have the Datetime type. You can do so by clicking on the column and then Change type > Date/Time.
If these methods do not solve your issue, maybe you can try working with a minimal example, having only a few rows, so that you can narrow down the variables with which to work. Then you can update your question with more information.
It would also be nice to know where are you getting the error Parameter "rhs" accepts only ["Datetime"]. It is not clear for me what the rhs (Right Hand Side) parameter is in this case, so maybe you can also provide more details about that.

SOCI, pgsql function returning table record - type_conversion not working

I have a pgsql function declared as:
CREATE FUNCTION auth.read_session(session_id varchar) RETURNS auth.sessions
It returns one record from the table auth.sessions.
I have a SOCI type_conversion that works perfectly fine when
I run select * from auth.sessions where id = :id.
It works when a matching record is found and when the result is NULL.
However, when I change the statement to:
select * from auth.read_session('invalid');
I get exception:
Null value not allowed for this type while executing "select * from
auth.read_session('invalid')".
I tried with listing columns, passing soci::indicator, etc.
I cannot get it to work.
The exception comes from base type_conversion<>.
In type-conversion-traits.h there is a comment stating that:
// default traits class type_conversion, acts as pass through for
row::get() // when no actual conversion is needed.
Why is no conversion needed? Yes my function returns the record of table type "auth.sessions".
Should it return RECORD instead so that the conversion gets launched?
Apparently the only way this can be done is having the function return SETOF records. I believe the conversion is not needed as in my case the whole result was NULL and it cannot be somehow cast to my own type.
Returning a single record of table type works as long as there is any result.
It would work if a function was designed to always return a record, even if an empty one, but still a "record".
PadThink you're right about the return. To call this function the way you're doing you need to return a QUERY, TABLE, or SETOF records.
http://www.postgresql.org/docs/9.2/static/xfunc-sql.html