Run Athena Query via CDK

Run Athena Query via CDK - amazon-athena

I am trying to create a table in Athena using the AWS CDK in C#. As my table needs to contain WITH SERDEPROPERTIES (and I cannot see how to add them when using aws-glue-alpha.Table), I have opted to create the table via an Athena query in the CDK.
I have tried using both CfnNamedQuery (which creates a saved query but does not run it) and AthenaStartQueryExecution (which does not every show up in CloudFormation).
Here is how they are defined:
var cfnNamedQuery = new CfnNamedQuery(this, "MyCfnNamedQuery", new CfnNamedQueryProps {
Database = DatabaseName,
QueryString = # "CREATE EXTERNAL TABLE " + Database.DatabaseName + # ".workflow(
`instructionid`
string
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
's3://bucket/workflow'
TBLPROPERTIES(
'has_encrypted_data' = 'false'
);
"
});
var startQueryExecutionJob = new AthenaStartQueryExecution(this, "AthenaStartQuery", new AthenaStartQueryExecutionProps {
QueryString = #"CREATE EXTERNAL TABLE " + DatabaseName + #".workflow(
`instructionid`
string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION 's3://bucket/workflow'
TBLPROPERTIES(
'has_encrypted_data' = 'false'
);
",
IntegrationPattern = IntegrationPattern.RUN_JOB,
WorkGroup = "primary",
ResultConfiguration = new ResultConfiguration {
OutputLocation = new Location {
BucketName = "mw-query-results-dev",
ObjectKey = "myprefix"
}
},
QueryExecutionContext = new QueryExecutionContext {
DatabaseName = DatabaseName
}
});
I am ideally looking for an answer to one of the three following questions:
How can I add WITH SERDEPROPERTIES when creating a table using aws-glue-alpha.Table?
How can I execute a saved query?
How do I correctly use AthenaStartQueryExecution?

Related

AWS Athena: varchar maximum value not matched with query

I created Athena with this query\
CREATE EXTERNAL TABLE IF NOT EXISTS report (
`token` varchar(40),
)
PARTITIONED BY ( `created_hour` string)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
'serialization.format' = '1'
) LOCATION 's3://demo-kinesis-athena/'
TBLPROPERTIES (
'has_encrypted_data'='false',
'projection.created_hour.format' = 'yyyy/MM/dd/HH',
'projection.created_hour.interval' = '1',
'projection.created_hour.interval.unit' = 'HOURS',
'projection.created_hour.range' = '2018/01/01/00,NOW',
'projection.created_hour.type' = 'date',
'projection.enabled' = 'true',
'storage.location.template' = 's3://demo-kinesis-athena/${created_hour}'
);
The query run successfully and the table is created, but if I generate table DDL it gives me the column type to be
`token` varchar(65535)`
instead.

Athena-express query returns nested array as a string

I have this json data in AWS S3, it's an array of objects.
[{"usefulOffer": "Nike shoe","webStyleId": "123","skus": [{"rmsSkuId": "456","eventIds": ["", "7", "8", "9"]},{"rmsSkuId": "777","eventIds": ["B", "Q", "W", "H"]}],"timeStamp": "4545"},
{"usefulOffer": "Adidas pants","webStyleId": "35","skus": [{"rmsSkuId": "16","eventIds": ["2", "4", "boo", "la"]}],"timeStamp": "999"},...]
This is a query how I created table/schema in Athena for data above
CREATE EXTERNAL TABLE IF NOT EXISTS table (
usefulOffer STRING,
webStyleId STRING,
skus array<struct<rmsSkuId: STRING, eventIds: array<STRING>>>,
`timeStamp` STRING
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ('ignore.malformed.json' = 'true')
LOCATION 's3://...'
When I make a query to Athena using athena-express 'SELECT * FROM table' it returns the nice json format except the nested array it returns as a string
[
{
usefuloffer: 'Nike shoe',
webstyleid: '123',
skus: '[{rmsskuid=456, eventids=[, 7, 8, 9]}, {rmsskuid=777, eventids=[B, Q, W, H]}]',
timestamp: '4545'
},
{
usefuloffer: 'Adidas pants',
webstyleid: '35',
skus: '[{rmsskuid=16, eventids=[2, 4, boo, la]}]',
timestamp: '999'
},
I was trying create the table/schema without this option "WITH SERDEPROPERTIES ('ignore.malformed.json' = 'true')" but it returned me bad format at all.
How can I get the nested array as array but not as a string?
Thank you for help!

Bigquery Multiple Tables access with Terraform with dynamic table name

I am trying to create multiple table access using a local list and pass values into a single resource block:
locals {
map_of_all_tables = [
{
"table_name" : "table1"
"dataset_id" : "dataset_id1"
"table_id" : "table_id1"
},
{
"table_name" : "table2"
"dataset_id" : "dataset_id2"
"table_id" : "table_id2"
}
]
}
resource "google_bigquery_table_iam_member" "access" {
count = contains(var.table_name_list, local.map_of_all_tables[*].table_name) ? <(no. of matching tables)> : 0
project = "test-project1"
dataset_id = locals.map_of_all_tables[<indexOfMatchingTable>].dataset_id #dataset_id of matching table name
table_id = locals.map_of_all_tables[<indexOfMatchingTable>].table_id #table_id of matching table name
role = "roles/bigquery.dataViewer"
member = "user:${var.user_email}"
}
If the var.table_name_list contains any number of tables which matches the table name in the local list, it should create the resource "access[]" for each of these tables using the dataset ids and table ids from the list for these particular tables. Is this possible in Terraform? Any help would be appreciated. Thanks!

If I understand your question correctly, you have a list of tables in var.table_name_list var for which access needs to be given. All the tables are present in local.map_of_all_tables local variable & you want to filter it against var.table_name_list.
I'm assuming above scenarios as you haven't told how var.table_name_list looks like..
locals {
map_of_all_tables = [
{
"table_name" : "table1"
"dataset_id" : "dataset_id1"
"table_id" : "table_id1"
},
{
"table_name" : "table2"
"dataset_id" : "dataset_id2"
"table_id" : "table_id2"
},
{
"table_name" : "table3"
"dataset_id" : "dataset_id3"
"table_id" : "table_id3"
}
]
## this will filter
table_access_list = [for table in local.map_of_all_tables : table if contains(var.table_name_list, table.table_name)]
}
## assuming the var like below
variable "table_name_list" {
type = list(any)
default = ["table1", "table2"]
}
## output displaying the filtered tables
output "table_access_list" {
value = local.table_access_list
}
Then, you could iterate over the local.table_access_list var to grant access only to desired tables.
resource "google_bigquery_table_iam_member" "access" {
for_each = {
for table_access in local.table_access_list : table_access.table_name => table_access
}
project = "test-project1-${each.value.table_name}"
dataset_id = local.table_access_list[each.value.table_name].dataset_id #dataset_id of matching table name
table_id = local.table_access_list[each.value.table_name].table_id #table_id of matching table name
role = "roles/bigquery.dataViewer"
member = "user:${var.user_email}"
}

Errors in Siddhi app.Different definition same as output 'define stream

I create a stream to read data from csv and write it in Postgresql it do everything just insert data on db:
my csv consist of 1,test,1
my stream :
#App:name("StockFile")
#App:description('test ...')
#source(type='file',
dir.uri='file:C:\file',
action.after.process='NONE',
#map(type='csv'))
define stream IntputStream (Amount int, Location string, ProductId int);
#store(type = 'rdbms',
jdbc.url = "jdbc:postgresql://localhost:5432/postgres",
username = "xxx",
password = "xxx",
jdbc.driver.name = "org.postgresql.Driver",
table.name = 'Test',
operation = 'insert',
#map(type = 'keyvalue' ))
define stream outputstream (Amount int, Location string, ProductId int);
#info(name = 'Save stock records')
from IntputStream
select Amount,Location,ProductId
insert into outputstream;

When you define a table, the definition should be a table definition so the correct way to define the outputstream is
#store(type = 'rdbms',jdbc.url = "jdbc:postgresql://localhost:5432/postgres",username = "xxx",password = "xxx",jdbc.driver.name = "org.postgresql.Driver",table.name = 'Test',operation = 'insert', #map(type = 'keyvalue' ))
define table outputstream (Amount int, Location string, ProductId int);

How to delete a few rows via SQL in QuestDB?

is there a way to delete a view rows matching a query in QuestDB?
I can't find any statement allowing me that.
This would be the best option:
delete from mytable where columnvalue==2;
Thanks!

In QuestDb Update and Delete statement are not supported. At least now. The ways to delete data are:
Drop a partition
Write a copy of the table without the rows you want to delete, drop table and then rename the table to the one you wanted. Something like
Create table mytablecopy AS (
SELECT * FROM mytable where columnvalue != 2
) Timstamp(...) PARTITION BY ...;
DROP TABLE mytable;
RENAME table mytablecopy TO mytable;
These are costly workarounds for exceptional cases.

Updates are allowed in questdb now. In my opinion a much better option is to have an extra column in all your tables called something like isDeleted and use the the update query to maintain what is deleted and whats not. Another note here would be to add indexing on this column for efficiency.
see this for more details : https://questdb.io/docs/develop/update-data#postgres-compatibility
See the example below how to use the update query :
"use strict"
const { Client } = require("pg")
const start = async () => {
const client = new Client({
database: "qdb",
host: "127.0.0.1",
password: "quest",
port: 8812,
user: "admin",
options: "-c statement_timeout=300000"
})
await client.connect()
const createTable = await client.query(
"CREATE TABLE IF NOT EXISTS trades (ts TIMESTAMP, date DATE, name STRING, value INT) timestamp(ts);"
)
console.log(createTable)
for (let rows = 0; rows < 10; rows++) {
// Providing a 'name' field allows for prepared statements / bind variables
let now = new Date().toISOString()
const query = {
name: "insert-values",
text: "INSERT INTO trades VALUES($1, $2, $3, $4);",
values: [now, now, "node pg prep statement", rows],
}
await client.query(query)
}
const updateData = await client.query(
"UPDATE trades SET name = 'update example', value = 123 WHERE value > 7;"
)
console.log(updateData)
await client.query("COMMIT")
const readAll = await client.query("SELECT * FROM trades")
console.log(readAll.rows)
await client.end()
}
start()
.then(() => console.log("Done"))
.catch(console.error)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Run Athena Query via CDK - amazon-athena

Related

AWS Athena: varchar maximum value not matched with query

Athena-express query returns nested array as a string

Bigquery Multiple Tables access with Terraform with dynamic table name

Errors in Siddhi app.Different definition same as output 'define stream

How to delete a few rows via SQL in QuestDB?

Categories

Resources