Convert Map<string><string> in QuickSight - amazon-web-services

I have a column of type Map string->string in Athena and this is not recognized in AWS QuickSight. I am trying to convert this field to varchar in QuickSight using SQL
SELECT cast(body as varchar) FROM db.events;
But it fails
Cannot cast map(varchar,varchar) to varchar
How can I convert this field correctly so QuickSight can query against it?

I think there is is no easy way to do that, but maybe there are some workarounds.
If each map has two keys with known names you can create two new columns:
SELECT
ELEMENT_AT(map_col,'key1') AS key1_col
,ELEMENT_AT(map_col,'key2') AS key2_col
FROM
(
SELECT
MAP(
ARRAY['key1','key2'],
ARRAY['val1','val2']
) AS map_col
)
Which will output:
key1_col
key2_col
val1
val2
If your map column has just one key you can adapt the snippet above and use it or use this one:
SELECT
ARRAY_JOIN(MAP_KEYS(map_col), ', ') AS keys
,ARRAY_JOIN(MAP_VALUES(map_col), ', ') AS vals
FROM
(
SELECT
MAP(
ARRAY['key1'],
ARRAY['val1']
) AS map_col
)
which will result in:
keys
vals
key1
val1
As said above, there is no correct way, if you have many keys you can try to use the second snippet to create strings to store keys and values, and later use calculated fields (maybe using split) to access them.
Hope it helps (:

Related

Reading Json data from Athena

i have created a table by mapping the json data, unfortunately i am not able to read the nested array within the json.
{
"total":10,
"count":100,
"values":{
"source":[{"sourceid":"10001","source":"ABC"},
{"sourceid":"10002","source":"XYZ"}
]}
}
```athena table
CREATE EXTERNAL TABLE source_master_data(
total bigint,
count bigint,
values struct<source: array<struct<sourceid: string>>>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION 's3://sourcemaster/'
I am trying to read the sourceid and source but no luck.. can anyone help me out
select t1.source.sourceid
from source_master_data
cross join UNNEST(source_master_data.Values) AS t1
The unnest need to be placed on the array type. In your query, you are trying to unnest the struct which is not possible in Athena.
The second issue is the use of values without quotes. This also fails, because values is a reserved word in Athena.
The overall query would look something like this.
select t1.source.sourceid
from source_master_data
cross join UNNEST(source_master_data."values".source) AS t1 (source)

AWS CTAS: How to double quote values?

In AWS Athena, how can I specify having the values double quoted "value". I managed to specify the delimiter using the field_delimiter expression.
Assuming you have table cust_transaction with two columns id, amount where amount is int datatype, you can CTAS is as follows. The approach is quite manual and can be cumbersome if the number of columns are big. You will need to explicitly cast non-string data types to varchar too. Hope that helps. Is it what you were looking for?
create table cust_transaction_pipe_1
with (external_location = 's3://aws_bucket/cust_tx_pipe_1/',format='TEXTFILE',field_delimiter = '|')
as
select concat(chr(34), id ,chr(34)) as id,concat(chr(34) , cast(amount as varchar) ,chr(34)) as amount from cust_transaction

Athena/Presto data discovery query to recommend JSON schema?

I have an Athena table (raw) with just one column (json).
I have the following query which outputs the frequencies of json keys:
SELECT key, count(*)
FROM (
SELECT map_keys(cast(json_parse(json) AS map(varchar, json))) AS keys
FROM raw
)
CROSS JOIN UNNEST (keys) AS t (key)
GROUP BY key
How can I extend this query so that it'll tell me whether a particular key has values with any non-numeric characters?
[failed attempts deleted after I found answer]
This works:
SELECT k, count(*) as isPresent, sum(isNumber) as isNumber,
count(*)-sum(isNumber) as notIsNumber from (
with dataset as (SELECT
cast(json_parse(json) AS map(varchar, varchar)) as kv FROM raw)
SELECT t.k, t.v,
IF(TRY(cast(t.v as double)) is null, 0, 1) as isNumber
from dataset cross join unnest(kv) as t(k, v)
) group by k

Ignite SqlFieldsQuery specific keys

Using the ignite C++ API, I'm trying to find a way to perform an SqlFieldsQuery to select a specific field, but would like to do this for a set of keys.
One way to do this, is to do the SqlFieldsQuery like this,
SqlFieldsQuery("select field from Table where _key in (" + keys_string + ")")
where the keys_string is the list of the keys as a comma separated string.
Unfortunately, this takes a very long time compared to just doing cache.GetAll(keys) for the set of keys, keys.
Is there an alternative, faster way of getting a specific field for a set of keys from an ignite cache?
EDIT:
After reading the answers, I tried changing the query to:
auto query = SqlFieldsQuery("select field from Table t join table(_key bigint = ?) i on t._key = i._key")
I then add the arguments from my set of keys like this:
for(const auto& key: keys) query.AddArgument(key);
but when running the query, I get the error:
Failed to bind parameter [idx=2, obj=159957, stmt=prep0: select field from Table t join table(_key bigint = ?) i on t._key = i._key {1: 159956}]
Clearly, this doesn't work because there is only one '?'.
So I then tried to pass a vector<int64_t> of the keys, but I got an error which basically says that std::vector<int64_t> did not specialize the ignite BinaryType. So I did this as defined here. When calling e.g.
writer.WriteInt64Array("data", data.data(), data.size())
I gave the field a arbitrary name "data". This then results in the error:
Failed to run map query remotely.
Unfortunately, the C++ API is neither well documented, nor complete, so I'm wondering if I'm missing something or that the API does not allow for passing an array as argument to the SqlFieldsQuery.
Query that uses IN clause doesn't always use indexes properly. The workaround for this is described here: https://apacheignite.readme.io/docs/sql-performance-and-debugging#sql-performance-and-usability-considerations
Also if you have an option to to GetAll instead and lookup by key directly, then you should use it. It will likely be more effective anyway.
Query with operator "IN" will not always use indexes. As a workaround, you can rewrite the query in the following way:
select field from Table t join table(id bigint = ?) i on t.id = i.id
and then invoke it like:
new SqlFieldsQuery(
"select field from Table t join table(id bigint = ?) i on t.id = i.id")
.setArgs(new Object[]{ new Integer[] {2, 3, 4} }))

Matching number sequences in SQLite with random character separators

I have an sqlite database which has number sequences with random separators. For example
_id data
0 123-45/678>90
1 11*11-22-333
2 4-4-5-67891
I want to be able to query the database "intelligently" with and without the separators. For example, both these queries returning _id=0
SELECT _id FROM myTable WHERE data LIKE '%123-45%'
SELECT _id FROM myTable WHERE data LIKE '%12345%'
The 1st query works as is, but the 2nd query is the problem. Because the separators appear randomly in the database there are too many combinations to loop through in the search term.
I could create two columns, one with separators and one without, running each query against each column, but the database is huge so I want to avoid this if possible.
Is there some way to structure the 2nd query to achieve this as is ? Something like a regex on each row during the query ? Pseudo code
SELECT _id
FROM myTable
WHERE REPLACEALL(data,'(?<=\\d)[-/>*](?=\\d)','') LIKE '%12345%'
Ok this is far from being nice, but you could straightforwardly nest the REPLACE function. Example:
SELECT _id FROM myTable
WHERE REPLACE(..... REPLACE(REPLACE(data,'-',''),'_',''), .... '<all other separators>','') = '12345'
When using this in practice (--not that I would recommend it, but at least its simple), you surely might wrap it inside a function.
EDIT: for a small doc on the REPLACE function, see here, for example.
If I get it right, is this what you want?
SELECT _id
FROM myTable
WHERE Replace(Replace(Replace(data, '?', ''), '/', ''), '-', '') LIKE '%12345%'