aws athena query result in json format - amazon-web-services

I create aws athena table that contain some rows
example of data:
first_name | age
=================
a 20
b 30
c 35
When I query the data I the result are saved in CSV format in S3.
SELECT * FROM table1
I would query the data and get the result in JSON format.
The reason is that I should transfer that JSON data to another application for another process.
Is there a way to get query result in JSON format?

Related

AWS Athena: Partition projection using date-hour with mixed ranges

I am trying to create an Athena table using partition projection. I am delivering records to S3 using Kinesis Firehouse, grouped using a dynamic partitioning key. For example, the records look like the following:
period
item_id
2022/05
monthly_item_1
2022/05/04
daily_item_1
2022/05/04/02
hourly_item_1
2022/06
monthly_item_2
I want to partition the data in S3 by period, which can be monthly, daily or hourly. It is guaranteed that period would be in a supported Java date format. Therefore, I am writing these records to S3 in the below format:
s3://bucket/prefix/2022/05/monthly_items.gz
s3://bucket/prefix/2022/05/04/daily_items.gz
s3://bucket/prefix/2022/05/04/02/hourly_items.gz
s3://bucket/prefix/2022/06/monthly_items.gz
I want to run Athena queries for every partition scope i.e. if my query is for a specific day, I want to fetch its daily_items and hourly_items. If I am running a query for a month, I want to its fetch monthly, daily as well as hourly items.
I've created an Athena table using below query:
create external table `my_table`(
`period` string COMMENT 'from deserializer',
`item_id` string COMMENT 'from deserializer')
PARTITIONED BY (
`year` string,
`month` string,
`day` string,
`hour` string)
ROW FORMAT SERDE
'org.openx.data.jsonserde.JsonSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
LOCATION
's3://bucket/prefix/'
TBLPROPERTIES (
'projection.enabled'='true',
'projection.day.type'='integer',
'projection.day.digits' = '2',
'projection.day.range'='01,31',
'projection.hour.type'='integer',
'projection.hour.digits' = '2',
'projection.hour.range'='00,23',
'projection.month.type'='integer',
'projection.month.digits'='02',
'projection.month.range'='01,12',
'projection.year.format'='yyyy',
'projection.year.range'='2022,NOW',
'projection.year.type'='date',
'storage.location.template'='s3://bucket/prefix/${year}/${month}/${day}/${hour}')
However, with this table running below query outputs zero results:
select * from my_table where year = '2022' and month = '06';
I believe the reason is Athena expects all files to be present under the same prefix as defined by storage.location.template. Therefore, any records present under a month or day prefix are not projected.
I was wondering if it was possible to support such querying functionality in a single table with partition projection enabled, when data in S3 is in a folder type structure similar to the examples above.
Would be great if anyone can help me out!

Snowflake date column have incorrect date from AVRO file

I have HIVE external table created in AVRO format. Data stored on S3 location. Now i am creating snowflake table on same data file stored on s3 in avro format. But getting issue with date column. Date is not coming correctly although string and int data are coming correctly in snowflake table.
data in hive table( col1 is timestamp data type in hive table):
col1
2021-02-04 10:02:31
data in snowflake table:
col1
53066-07-15 12:56:40.000
sql to create snowflake table:
CREATE OR REPLACE EXTERNAL TABLE test1
(
col1 timestamp as (value:col1::timestamp),
)
WITH LOCATION = #S3_location/folder/
AUTO_REFRESH = TRUE
FILE_FORMAT = 'AVRO';"

Query Google Big Query Table by today's date

I have query pertaining to the google big query tables. We are currently looking to query the big query table based on the file uploaded on the day into the cloud storage.
Meaning:
I have to load the data into big query table based on every day's data into cloud storage.
When i query:
select * from BQT where load_date =<TODAY's DATE>
Can we achieve this without adding the date field into the file?
If you just don't want to add a date column, Append current date suffix to your table name like BQT_20200112 when the GCS file is uploaded.
Then you can query specific datetime table by _TABLE_SUFFIX syntax.
Below is example query using _TABLE_SUFFIX
SELECT
field1,
field2,
field3
FROM
`your_dataset.BQT_*`
WHERE
_TABLE_SUFFIX = '20200112'
As you see, You don't need to add additional field like load_date when you query the tables using date suffix and wildcard symbol.

Using Dataprep to write to just a date partition in a date partitioned table

I'm using a BigQuery view to fetch yesterday's data from a BigQuery table and then trying to write into a date partitioned table using Dataprep.
My first issue was that Dataprep would not correctly pick up DATE type columns, but converting them to TIMESTAMP works (thanks Elliot).
However, when using Dataprep and setting an output BigQuery table you only have 3 options for: Append, Truncate or Drop existing table. If the table is date partitioned and you use Truncate it will remove all existing data, not just data in that partition.
Is there another way to do this that I should be using? My alternative is using Dataprep to overwrite a table and then using Cloud Composer to run some SQL pushing this data into a date partitioned table. Ideally, I'd want to do this just with Dataprep but that doesn't seem possible right now.
BigQuery table schema:
Partition details:
The data I'm ingesting is simple. In one flow:
+------------+--------+
| date | name |
+------------+--------+
| 2018-08-08 | Josh1 |
| 2018-08-08 | Josh2 |
+------------+--------+
In the other flow:
+------------+--------+
| date | name |
+------------+--------+
| 2018-08-09 | Josh1 |
| 2018-08-09 | Josh2 |
+------------|--------+
It overwrites the data in both cases.
You ca create a partitioned table bases on DATE. Data written to a partitioned table is automatically delivered to the appropriate partition.
Data written to a partitioned table is automatically delivered to the appropriate partition based on the date value (expressed in UTC) in the partitioning column.
Append the data to have the new data added to the partitions.
You can create the table using the bq command:
bq mk --table --expiration [INTEGER1] --schema [SCHEMA] --time_partitioning_field date
time_partitioning_field is what defines which field you will be using for the partitions.

copy timestamp from AWS iot rule to Amazon redshift table column

My current iot design is iot > rule > kinesis firehose > redshift
I have iot rule as
SELECT *, timestamp() AS timestamp FROM 'topic/#
I get json message something like below
{
"deviceID": "device6",
"timestamp": 1480926222159
}
In my redshift table I have a column eventtime as Timestamp
Now i want to store the json timestamp value to eventtime column, but it gives me error as it needs
TIMEFORMAT AS 'MM.DD.YYYY HH:MI:SS
for timestamp. So how to covert the iot rules timestamp to redshift timestamp?
There is no direct way to converting epoch date value while inserting it to Redshift table Timestamp datatype column.
I have created a column with Bigint datatype and inserting epoch value directly to this column.
After that I am using Quicksight for analytics so I can edit my dataset and create New calculated field for this column and use Qucksight function as below
epochDate(epoch_date)
which converts the epoch value to timestamp field.
One can use similar functions like
SELECT
(TIMESTAMP 'epoch' + myunixtimeclm * INTERVAL '1 Second ')
AS mytimestamp
FROM
example_table