I am trying to create an external table in Amazon Athena. My query is the following:
CREATE EXTERNAL TABLE priceTable (
WeekDay STRING,
MonthDay INT,
price00 FLOAT,
price01 FLOAT,
price02 FLOAT,
price03 FLOAT,
price04 FLOAT,
price05 FLOAT,
price06 FLOAT,
price07 FLOAT,
price08 FLOAT,
price09 FLOAT,
price10 FLOAT,
price11 FLOAT,
price12 FLOAT,
price13 FLOAT,
price14 FLOAT,
price15 FLOAT,
price16 FLOAT,
price17 FLOAT,
price18 FLOAT,
price19 FLOAT,
price20 FLOAT,
price21 FLOAT,
price22 FLOAT,
price23 FLOAT,
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ';'
LINES TERMINATED BY '\n'
LOCATION 's3://myquicksighttestbucket/C1_SphdemDD_CANARIAS_20190501_20190531_v2'
Where the file in S3 is just a csv deliminted by semicolons.
However, I get the following error:
line 1:8: mismatched input 'external'. expecting: 'or', 'schema', 'table', 'view' (service: amazonathena; status code: 400; error code: invalidrequestexception; request id: e524f7e6-39ca-4af7-9e39-f86a4d0a36c8; proxy: null)
Can anybody tell what I am doing wrong? Any help is much appreciated.
Oooh! I am sorry, the error was the comma after the last field!!
And, also, instead of:
FIELDS TERMINATED BY ';'
I should have used the delimiter's OCT code (073) like this:
FIELDS TERMINATED BY '073'
Make sure table name does not have "-", spaces, or any other character not allowed in table names.
I had invalid field names which included - chars. A rather easy mistake when copying names like flow-direction directly from flow logs definitions.
I had the same error today, and unlike others, I had a partitioned by clause where I didn't submit the type for the column:
CREATE EXTERNAL TABLE IF NOT EXISTS table_name(
creationtime string,
anumber bigint,
somearray array<struct<...>>,
somestring string)
PARTITIONED BY (creation_date string)
^^^^^^ <--- 'string' was missing
ROW FORMAT SERDE
'org.openx.data.jsonserde.JsonSerDe'
LOCATION
's3://location/';
Once I added the type, the error vanished and the query was successful.
Lots of answers here already, but I just wanted to summarize and say it seems like any syntax error in the statement can cause this error.
In my case I had a trailing comma after the last item of my TBLPROPERTIES
I got same error,changing column datatype INTEGER to INT resolved this error for me.
https://docs.aws.amazon.com/athena/latest/ug/data-types.html
int and integer – Athena uses different expressions for integer depending on the type of query.
int – In Data Definition Language (DDL) queries like CREATE TABLE, use the int data type.
integer – In DML queries like SELECT * FROM, use the integer data type. integer is represented as a 32-bit signed value in two's complement format, with a minimum value of -231 and a maximum value of 231-1.
Related
I'm trying to read the following rows out of a CSV file stored in GCS
headers: "A","B","C","D"
row1:"4000,0000000000000","15400000,000","12311918,400000","3088081,600"
row2:"5000,0000000000000","19250000,000","15389898,000000","3860102,000"
The issue here is how BigQuery is actually interpreting and thus outputting these numbers:
Results query number 1
It's interpreting A as FLOAT64, and B, C and D as INT64, which is okay since I decided to use autodetect schema. But when I try to convert it to a different type it's still outputting the numbers unproperly.
This is the query:
SELECT
CAST(quantity AS INT64) AS A,
CAST(expenses_2 AS FLOAT64) AS B,
CAST(cexpenses_3AS FLOAT64) AS C,
CAST(expenses_4 AS FLOAT64) AS D
FROM
`wide-gecko-289100.bqtest.expenses`
These are the results of query above:
Result query number 2
Either way, it's misinterpreting how to read the numbers, it should be as follows:
row1: [4000] [15400000] [12311918,4] [3088081,6]
row2: [5000] [19250000] [15389898] [3860102]
Is there a way to solve this?
This is due to BigQuery not understanding the localized format you're using for the numeric values. It expects the period (.) character for the decimal separator.
If you can't deal with this early in the process that produces the CSV files in BigQuery, another strategy is to instead use a string type for the columns, and then do some manipulation.
Here's a simple conversion example that shows some string manipulation and casting to get to the desired type. If you're using both commas and periods as part of the localized format, you'll need a more complex string manipulation.
WITH
sample_row AS (
SELECT "4000,0000000000000" as A, "15400000,000" as B,"12311918,400000" as C,"3088081,600" as D
)
SELECT
A,
CAST(REPLACE(A,",",".") AS FLOAT64) as A_as_float64,
CAST(CAST(REPLACE(A,",",".") AS FLOAT64) AS INT64) as A_as_int64
FROM
sample_row
You could also generalize this as a user defined function (temporary or persisted) to make it easier to reuse:
CREATE TEMPORARY FUNCTION parseAsFloat(instr STRING) AS (CAST(REPLACE(instr,",",".") AS FLOAT64));
WITH
sample_row AS (
SELECT "4000,0000000000000" as A, "15400000,000" as B,"12311918,400000" as C,"3088081,600" as D
)
SELECT
CAST(parseAsFloat(A) AS INT64) as A,
parseAsFloat(B) as B,
parseAsFloat(C) as C,
parseAsFloat(D) as D,
FROM
sample_row
I think this is an issue with how BigQuery interprets a comma. It seems to detect it as a thousands separator rather than a decimal.
https://issuetracker.google.com/issues/129992574
Is it possible to replace with a "." instead?
I want to convert a readable timestamp to UNIX time.
For example: I want to convert 2018-08-24 18:42:16 to 1535136136000.
Here is my syntax:
TO_UNIXTIME('2018-08-24 06:42:16') new_year_ut
My error is:
SYNTAX_ERROR: line 1:77: Unexpected parameters (varchar(19)) for function to_unixtime. Expected: to_unixtime(timestamp) , to_unixtime(timestamp with time zone)
You need to wrap the varchar in a CAST to timestamp:
to_unixtime(CAST('2018-08-24 06:42:16' AS timestamp)) -- note: returns a double
If your timestamp value doesn't have fraction of second (or you are not interested in it), you can cast to bigint to have integral result:
CAST(to_unixtime(CAST('2018-08-24 06:42:16' AS timestamp)) AS BIGINT)
If your readable timestamp value is a string in different format than the above, you would need to use date_parse or parse_datetime for the conversion. See https://trino.io/docs/current/functions/datetime.html for more information.
Note: when dealing with timestamp values, please keep in mind that: https://github.com/trinodb/trino/issues/37
I am attempting to copy data into redshift from an S3 bucket, however I am getting a 1204 error code 'char length exceeds DDL length'.
copy table_name from '[data source]'
access_key_id '[access key]'
secret_access_key '[secret access key]'
region 'us-east-1'
null as 'NA'
delimiter ','
removequotes;
The error occurs in the very first row, where it tries to put the state abbreviation 'GA' into the data_state column which is defined with the data type char(2). When I query the stl_load_errors table I get the following result:
line_number colname col_length type raw_field_value err_code err_reason
1 data_state 2 char GA 1204 Char length exceeds DDL length
As far as I can tell that shouldn't exceed the length as it is two characters and it is set to char(2). Does anyone know what could be causing this?
Got it to work by changing the data type to char(3) instead, however still not sure why char(2) wouldn't work
Mine did this as well, for a state column too. Redshift defaults char to char(1) - so I had to specify char(2) - are you sure it didn't default back to char(1) because mine did
Open the file up with a Hex editor, or use an online one here, and look at the GA value in the data_state column.
If it has three dots before it like so:
...GA
Then the file (or when it was orignally created) was UTF-8-BOM not just UTF-8.
You can open the file in something like Notepad++ and go to Encoding in the top bar then select Convert to UTF-8.
I am trying to create a siddhi application where it adds output when a person is in proximity to certain preset locations. These locations are stored in the database. The input is sent from postman as of now.
But I keep getting the error saying the datatype is not 'double'. I have even checked the table details and the datatype in mysql table is set to double.
Following are the details for siddhi code and the error.
Can someone please guide me.
location for the siddhi extension:
https://wso2-extensions.github.io/siddhi-gpl-execution-geo/api/latest/
Siddhi code:
#App:name('ShipmentHistoryApp')
#source(type = 'http', receiver.url='http://localhost:5008/RawMaterials', #map(type = 'json'))
define stream WalkingStream(latitude DOUBLE, longitude DOUBLE, device_id string);
#store(type='rdbms', jdbc.url="jdbc:mysql://127.0.0.1:3306/SweetFactoryDB", username="root", password="root" , jdbc.driver.name="com.mysql.jdbc.Driver")
define table Offers(c string, offer string, latitude DOUBLE, longitude DOUBLE);
#sink(type='log')
define stream SetLocation(a string, b string,one bool, two bool, dis double);
#sink(type='log', prefix='Only log')
define stream info(one bool, two bool);
from WalkingStream as w
join SetLocation as o
select o.a, o.b, instanceOfDouble(o.latitude) as one, instanceOfDouble(o.longitude) as two, geo:distance(w.latitude,w.longitude,o.latitude,o.longitude) as dis insert into Output;
I'm getting this error when trying to find distance between two locations.
org.wso2.siddhi.core.exception.SiddhiAppRuntimeException: Invalid input given to geo:distance() function. Third argument should be double
at org.wso2.extension.siddhi.gpl.execution.geo.function.GeoDistanceFunctionExecutor.execute(GeoDistanceFunctionExecutor.java:123)
at org.wso2.siddhi.core.executor.function.FunctionExecutor.execute(FunctionExecutor.java:109)
at org.wso2.siddhi.core.query.selector.attribute.processor.AttributeProcessor.process(AttributeProcessor.java:41)
at org.wso2.siddhi.core.query.selector.QuerySelector.processNoGroupBy(QuerySelector.java:145)
at org.wso2.siddhi.core.query.selector.QuerySelector.process(QuerySelector.java:87)
at org.wso2.siddhi.core.query.input.stream.join.JoinProcessor.process(JoinProcessor.java:110)
at org.wso2.siddhi.core.query.processor.stream.window.LengthWindowProcessor.process(LengthWindowProcessor.java:135)
at org.wso2.siddhi.core.query.processor.stream.window.WindowProcessor.processEventChunk(WindowProcessor.java:66)
at org.wso2.siddhi.core.query.processor.stream.AbstractStreamProcessor.process(AbstractStreamProcessor.java:123)
at org.wso2.siddhi.core.query.input.stream.join.JoinProcessor.process(JoinProcessor.java:118)
at org.wso2.siddhi.core.query.input.ProcessStreamReceiver.processAndClear(ProcessStreamReceiver.java:187)
at org.wso2.siddhi.core.query.input.ProcessStreamReceiver.process(ProcessStreamReceiver.java:97)
at org.wso2.siddhi.core.query.input.ProcessStreamReceiver.receive(ProcessStreamReceiver.java:133)
at org.wso2.siddhi.core.stream.StreamJunction.sendEvent(StreamJunction.java:151)
at org.wso2.siddhi.core.stream.StreamJunction$Publisher.send(StreamJunction.java:358)
at org.wso2.siddhi.core.stream.input.InputDistributor.send(InputDistributor.java:34)
at org.wso2.siddhi.core.stream.input.InputEntryValve.send(InputEntryValve.java:44)
at org.wso2.siddhi.core.stream.input.InputHandler.send(InputHandler.java:61)
at org.wso2.siddhi.core.stream.input.source.PassThroughSourceHandler.sendEvent(PassThroughSourceHandler.java:35)
at org.wso2.siddhi.core.stream.input.source.InputEventHandler.sendEvent(InputEventHandler.java:76)
at org.wso2.extension.siddhi.map.json.sourcemapper.JsonSourceMapper.mapAndProcess(JsonSourceMapper.java:211)
at org.wso2.siddhi.core.stream.input.source.SourceMapper.onEvent(SourceMapper.java:132)
at org.wso2.extension.siddhi.io.http.source.HttpWorkerThread.run(HttpWorkerThread.java:62)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Alternatively I've also tried to cast the latitude from table into double in the query.
This can be occurred when the 3rd argument of geo:distance function becomes null.
Can you verify whether o.latitude can become null?
I have a list of tuples like below -
[(float.inf, 1.0), (270, 0.9002), (0, 0.0)]
I am looking for a simple serializer/deserializer that helps me store this tuple in a jsonb field in PostgreSQL.
I tried using JSONEncoder().encode(a_math_function) but didn't help.
I am facing the following error while attempting to store the above list in jsonb field -
django.db.utils.DataError: invalid input syntax for type json
LINE 1: ...", "a_math_function", "last_updated") VALUES (1, '[[Infinit...
DETAIL: Token "Infinity" is invalid.
Note: the field a_math_function is of type JSONField()
t=# select 'Infinity'::float;
float8
----------
Infinity
(1 row)
because
https://www.postgresql.org/docs/current/static/datatype-numeric.html#DATATYPE-FLOAT
In addition to ordinary numeric values, the floating-point types have
several special values:
Infinity
-Infinity
NaN
yet, the json does not have such possible value (unless its string)
https://www.json.org/
value
string
number
object
array
true
false
null
thus:
t=# select '{"k":Infinity}'::json;
ERROR: invalid input syntax for type json
LINE 1: select '{"k":Infinity}'::json;
^
DETAIL: Token "Infinity" is invalid.
CONTEXT: JSON data, line 1: {"k":Infinity...
Time: 19.059 ms
so it's not the jango or postgres limitation - just Infinity is invalid token, yet 'Infinity' is a valid string. so
t=# select '{"k":"Infinity"}'::json;
json
------------------
{"k":"Infinity"}
(1 row)
works... But Infinity here is "just a word". Of course you can save it as a string, not as numeric value and check every string if it's not equal "Infinity", and if it is - launch your program logic to treat it as real Infinity... But in short - you can't do it, because json specification does not support it... same asyou can't store lets say red #ff0000 as colour in json - only as string, to be caught and processed by your engine...
update:
postgres would cast float to text itself on to_json:
t=# select to_json(sub) from (select 'Infinity'::float) sub;
to_json
-----------------------
{"float8":"Infinity"}
(1 row)
update
https://www.postgresql.org/docs/current/static/datatype-json.html
When converting textual JSON input into jsonb, the primitive types
described by RFC 7159 are effectively mapped onto native PostgreSQL
types
...
number numeric NaN and infinity values are disallowed