I am using following DDL query to create table
CREATE EXTERNAL TABLE IF NOT EXISTS poi_test1(
'taxonomy_level_1' string,
'taxonomy_level_2' string,
'taxonomy_level_3' string,
'taxonomy_level_4' string,
'poi_name' string,
'mw_segment_name' string,
'latitude' double,
'longitude' double,
'city' string,
'state' string,
'country_code' string,
'default_radius' float
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
's3://mw.test/jishan1/qa1/poi1';
Error: line 1:8: no viable alternative at input 'create external' (service: amazonathena; status code: 400; error code: invalidrequestexception; request id: 5dbd0eb8-6842-45ca-8f60-9f17fd2e4c04)
You should remove single quotes around columns or enclose them in backticks if reserved keywords present and in double quotes if column starts with digit.
Read this for naming conventions to be used with Athena.
I ran your query as shown below by removing single quotes and it created table successfully
CREATE EXTERNAL TABLE poi_test1(
taxonomy_level_1 string,
taxonomy_level_2 string,
taxonomy_level_3 string,
taxonomy_level_4 string,
poi_name string,
mw_segment_name string,
latitude double,
longitude double,
city string,
state string,
country_code string,
default_radius float
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION's3://mw.test/jishan1/qa1/poi1';
Related
I am trying to load a files from s3 to Athena to perform a query operation. But all the column values are getting added to the first column.
I have file in the following format:
id,user_id,personal_id,created_at,updated_at,active
34,34,43,31:28.4,27:07.9,TRUE
This is the output I get:
Table creation query:
CREATE EXTERNAL TABLE `testing`(
`id` string,
`user_id` string,
`personal_id` string,
`created_at` string,
`updated_at` string,
`active` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
's3://testing2fa/'
TBLPROPERTIES (
'transient_lastDdlTime'='1665356861')
Please can someone tell me where am I going wrong?
You should add skip.header.line.count to your table properties to skip the first row. As you have defined all columns as string data type Athena was unable to differentiate between header and first row.
DDL with property added:
CREATE EXTERNAL TABLE `testing`(
`id` string,
`user_id` string,
`personal_id` string,
`created_at` string,
`updated_at` string,
`active` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
's3://testing2fa/'
TBLPROPERTIES ('skip.header.line.count'='1')
The Serde needs some parameter to recognize CSV files, such as:
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
ESCAPED BY '\\'
LINES TERMINATED BY '\n'
See: LazySimpleSerDe for CSV, TSV, and custom-delimited files - Amazon Athena
An alternative method is to use AWS Glue to create the tables for you. In the AWS Glue console, you can create a Crawler and point it to your data. When you run the crawler, it will automatically create a table definition in Amazon Athena that matches the supplied data files.
Getting the following error,
line 1:8: mismatched input 'EXTERNAL'. Expecting: 'OR', 'SCHEMA', 'TABLE', 'VIEW'
when creating an Athena table with the following command,
CREATE EXTERNAL TABLE IF NOT EXISTS 'abcd_123' (Item:struct<Id:struct<S:string>>)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ('ignore.malformed.json' = 'true')
LOCATION 's3://mybucket'
I've gone through other Q&A's and none of the answers have helped me - any points as to where the error might be here ?
Try putting a space between Item and struct instead of a colon, like so
CREATE EXTERNAL TABLE IF NOT EXISTS 'abcd_123' (
Item struct<
Id:struct<
S:string
>
>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ('ignore.malformed.json' = 'true')
LOCATION 's3://mybucket'
This is taken from the AWS Athena docs. I believe the colon is only required between fields of structs and their types, not column names and their types.
CREATE EXTERNAL TABLE IF NOT EXISTS cloudfront_logs (
`Date` Date,
Time STRING,
Location STRING,
Bytes INT,
RequestIP STRING,
...
What is the problem in my syntax that the query is not running?
(error and error code mentioned below)
All the names have been fixed.
"foldername3" has only one file and its name is pinmap.csv.
There are only 9 columns in the csv file.
CREATE EXTERNAL TABLE IF NOT EXISTS default.`pinmap`(
'circle' string,
'region' string,
'division' string,
'office' string,
'pin' int,
'office_type' string,
'delivery' string,
'district' string,
'state' string
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
's3://bucketname/foldername3/'
TBLPROPERTIES (
'skip.header.line.count'='1');
Error code:
line 1:8: no viable alternative at input 'create external' (service:
amazonathena; status code: 400; error code: invalidrequestexception;
Ideally the query should import the csv file from s3 to amazon athena as a table named "pinmap" in the database named "default".
Try to use backticks instead of apostrophes
CREATE EXTERNAL TABLE IF NOT EXISTS `default`.`pinmap`(
`circle` string,
`region` string,
`division` string,
`office` string,
`pin` int,
`office_type` string,
`delivery` string,
`district` string,
`state` string
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
's3://bucketname/foldername3/'
TBLPROPERTIES (
'skip.header.line.count'='1');
Result:
Query successful.
Also note, that this query only defines meta information about your data in S3, e.g. table schema, database etc, which is then stored in AWS Glue datacatalog. So there is no actual import of csv file, they still remain in S3.
When I query my files from Data Catalog using Athena, all the data appears wrapped with quotes. Isit possible to remove those quotes?
I tried adding quoteChar option in the table settings, but it didnt help
UPDATE
As requested, the DDL:
CREATE EXTERNAL TABLE `holidays`(
`id` bigint,
`start` string,
`end` string,
`createdat` string,
`updatedat` string,
`deletedat` string,
`type` string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
WITH SERDEPROPERTIES (
'quoteChar'='\"')
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
's3://pinfare-glue/holidays/'
TBLPROPERTIES (
'CrawlerSchemaDeserializerVersion'='1.0',
'CrawlerSchemaSerializerVersion'='1.0',
'UPDATED_BY_CRAWLER'='pinfare-holidays',
'averageRecordSize'='84',
'classification'='csv',
'columnsOrdered'='true',
'compressionType'='none',
'delimiter'=',',
'objectCount'='1',
'recordCount'='29',
'sizeKey'='2494',
'skip.header.line.count'='1',
'typeOfData'='file')
I know its late but I think the issue is with the "Serde serialization lib"
In
AWS GLUE --> Click on the table --> Edit Table --> check "Serde serialization lib"
it's value should be "org.apache.hadoop.hive.serde2.OpenCSVSerde"
Than Click Apply
This should solve your issue. Below is a sample image for your reference.
I am trying to store the following data in a csv file into Hive table but not able to do it successfully
Ann, 78%,7,
Beth,81%,5,
Cathy,83%,2,
The data is present in CSV file. I created the table in Hive using below definition:
Hive> CREATE TABLE test1 (Name String, Perc String, Rank String)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES ("input.regex" = "^(\w+)\,(\w+)\%\,(\w+)$",
"output.format.string" = "%1$s %2$s %3$s" )
STORED AS TEXTFILE;"
ok
hive> load data local inpath '/tmp/input.csv' into table test1;
ok
hive> Select * from test1;
ok
Name Perc Rank
Null Null Null
Null Null Null
Null Null Null
I am not able to figure out the mistake. The resulting data is not getting loaded into the table.
You shouldn't need the RegexSerDe. You should be able to just set the delimiter to be a comma.
CREATE TABLE test1 (Name String, Perc String, Rank String) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE;
You could also check out this CVS Serde. https://github.com/ogrodnek/csv-serde
Use the OpenCSVSerde if you need flexibility.
CREATE EXTERNAL TABLE `mydb`.`mytable`(
`product_name` string,
`brand_id` string,
`brand` string,
`color` string,
`description` string,
`sale_price` string)
PARTITIONED BY (
`seller_id` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
'separatorChar' = '\t',
'quoteChar' = '"',
'escapeChar' = '\\')
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'hdfs://namenode.com:port/data/mydb/mytable'
TBLPROPERTIES (
'serialization.null.format' = '',
'skip.header.line.count' = '1')
With this, you have total control over the separator, quote character, escape character, null handling and header handling.
Look here and here.
Can you do use HIVE's inbuilt regexp UDF like this:
create table temp (raw STRING);
load data local inpath '/tmp/input.csv' into table temp;
create table table1
as
select regexp_extract(line, "^(\w+)\,(\w+)\%\,(\w+)$", 1) Name,
regexp_extract(line, "^(\w+)\,(\w+)\%\,(\w+)$", 2) Perc,
regexp_extract(line, "^(\w+)\,(\w+)\%\,(\w+)$", 3) Rank
from temp;
Based on your sample cvs data, your regex is not matching the trailing comma, and it is also not matching the optional space character as shown in the first sample line of cvs data. Your regex should be changed from:
^(\w+)\,(\w+)\%\,(\w+)$
To:
^(\w+)\,\s*(\w+)\%\,(\w+)\,$