I need to use something similiar a INST on athena AWS I have the next code but Athena doesnt accept it
select substr(column_name, instr(column_name in '-')+ 1) as a,
substr(column_name, 1, instr(column_name in '-')- 1) as b from table_name;
The function you are looking for is called STRPOS in Athena. The equivalent SQL in Athena would be:
SELECT
substr(column_name, strpos(column_name, '-')) AS a,
substr(column_name, 1, strpos(column_name, '-') - 1) AS b
FROM table_name
However, it looks like you're splitting column_name by '-', which you could also do with SPLIT_PART in Athena:
SELECT
split_part(column_name, '-', 1) AS a,
split_part(column_name, '-', 2) AS b
FROM table_name
You can also split into an array with SPLIT.
Related
I have the following query in Bigquery, which works.
SELECT a, b, (SELECT COUNT(*) FROM C.D as count) FROM some_other_table;
I want to replace C with the value of a.
a is a dataset and D is a table. How can I do this?
You can use the EXECUTE IMMEDIATE to execute a dynamic created query:
EXECUTE IMMEDIATE(
CONCAT("SELECT COUNT(*) FROM ", (SELECT a FROM some_other_table), ".D")
)
That way you can create a query and use the result to join with you some_other_table.
But if you want only the row number from the table, the view TABLE_STORAGE can help:
SELECT a, b, ts.total_rows as d_count
FROM some_other_table
LEFT JOIN `region-US`.INFORMATION_SCHEMA.TABLE_STORAGE ts
ON ts.table_schema = a AND ts.table_name = 'D'
When I use Athena CTAS to generate CSV files, I found that null values in the Athena table are replaced by "\N".
How do I get it to just leave these values as empty columns?
The CTAS query I'm using is something like this:
CREATE TABLE table_name WITH (format = 'TEXTFILE', field_delimiter=',', external_location='s3://bucket_name/location') AS SELECT * FROM "db_name"."src_table_name";
Am I doing something wrong?
This is the default token for NULL for LazySimpleSerDe, and CTAS does not expose any mechanism for changing it unfortunately.
If you'd rather have empty fields for your NULL values you have to ensure they are all empty strings, e.g. … AS SELECT COALESCE(col1, ''), COALESCE(col2, ''), ….
I'm trying to get substring dynamically and group by it. So if my uri column contains records like: /uri1/uri2 and /somelongword/someotherlongword I would like to get everything up to second delimiter, namely up to second / and count it. I'm using this query but obviously it is cutting string statically (6 letters after the first one).
SELECT substr(uri, 1, 6) as URI,
COUNT(*) as COUNTER
FROM staging
GROUP BY substr(uri, 1, 6)
ORDER BY COUNTER DESC
How can I achieve that?
You can use combination of SUBSTRING() and POSITION()
schema:
CREATE TABLE Table1
(`uri` varchar(10))
;
INSERT INTO Table1
(`uri`)
VALUES
('some/text'),
('some/text1'),
('some/text2'),
('aa/bb'),
('aa/cc'),
('bb/cc')
;
query
SELECT
SUBSTRING(uri,1,POSITION('/' IN uri)-1),
COUNT(*)
FROM Table1
GROUP BY SUBSTRING(uri,1,POSITION('/' IN uri)-1);
http://sqlfiddle.com/#!9/293dd3/3/0
edit: here I found amazon athena documentation: https://docs.aws.amazon.com/athena/latest/ug/presto-functions.html and here is the string function documentation: https://prestodb.io/docs/0.217/functions/string.html
my answer above still stands, but you might need to change SUBSTRING to SUBSTR
edit 2: it seems there's a special function to achieve this in amazon athena called SPLIT_PART()
query:
SELECT SPLIT_PART(uri, '/', 1), COUNT(*) FROM tbl GROUP BY SPLIT_PART(uri, '/', 1)
from docs:
split_part(string, delimiter, index) → varchar
Splits string on delimiter and returns the field index. Field indexes start with 1. If the index is larger than than the number of fields, then null is returned.
I have strings like below in my table
2001,2452,2452,2421,2421,2495
2001,2483,2421,2421,2482
2001,2420,2421,2421,2425
2001,2420,2421,2421,2422
2001,2452,2452,2421,2421,2464
I want to remove the repeated numbers like 2452 and 2421 and show them only once in the data like
2001,2452,2421,2495
2001,2483,2421,2482
2001,2420,2421,2425
2001,2420,2421,2422
2001,2452,2421,2464
Has anyone done something like this? please let me know how to solve this
Thanks!
In Oracle SQL, You can use the hierarchy query and listagg as follows:
select str, listagg(str_distinct, ',') within group (order by 1) as distinct_str from
(select distinct str, regexp_substr(str,'[^,]+',1,column_value) str_distinct from cte
cross join table(
cast(multiset(
select level lvl
from dual
connect by level <= regexp_count(str, '[^,]+'))
as sys.odcivarchar2list)
) lvls)
group by str;
db<>fiddle for one of the input string.
I just wonder how to eval the content of dynamic SQL using one select; this is the example. This is only an example. but I would like dynamically functions, and manage using single selects. ( I know that sqls are only for SELECT instead of modify... but In this deep querentee Im becomeing in a crazy developer)
SELECT 'SELECT SETVAL(' || chr(39) || c.relname || chr(39)|| ' ,
(SELECT MAX(Id)+1 FROM ' || regexp_replace(c.relname, '_[a-zA-Z]+_[a-zA-Z]+(_[a-zA-Z0-9]+)?', '', 'g') ||' ), true );'
FROM pg_class c WHERE c.relkind = 'S';
The original output is:
SELECT SETVAL('viewitem_id_seq' , (SELECT MAX(Id)+1 FROM viewitem ), true );
SELECT SETVAL('userform_id_seq' , (SELECT MAX(Id)+1 FROM userform ), true );
This is the dynamic sentence:
(SELECT MAX(Id)+1 FROM ' || regexp_replace(c.relname, '[a-zA-Z]+[a-zA-Z]+(_[a-zA-Z0-9]+)?', '', 'g')
is an string that generates as output a SQL, how to eval in the same line this statement?
The desired output is:
SELECT SETVAL('viewitem_id_seq' , 25, true );
SELECT SETVAL('userform_id_seq' , 85, true );
thanks!
If those are serial or identity columns it would be better to use pg_get_serial_sequence() to get the link between a table's column and its sequence.
You can actually run dynamic SQL inside a SQL statement by using query_to_xml()
I use the following script if I need to synchronize the sequences for serial (or identity) columns with their actual values:
with sequences as (
-- this query is only to identify all sequences that belong to a column
-- it's essentially similar to your select * from pg_class where reltype = 'S'
-- but returns the sequence name, table and column name to which the
-- sequence belongs
select *
from (
select table_schema,
table_name,
column_name,
pg_get_serial_sequence(format('%I.%I', table_schema, table_name), column_name) as col_sequence
from information_schema.columns
where table_schema not in ('pg_catalog', 'information_schema')
) t
where col_sequence is not null
), maxvals as (
select table_schema, table_name, column_name, col_sequence,
--
-- this is the "magic" that runs the SELECT MAX() query
--
(xpath('/row/max/text()',
query_to_xml(format('select max(%I) from %I.%I', column_name, table_schema, table_name), true, true, ''))
)[1]::text::bigint as max_val
from sequences
)
select table_schema,
table_name,
column_name,
col_sequence,
coalesce(max_val, 0) as max_val,
setval(col_sequence, coalesce(max_val, 1)) --<< this uses the value from the dynamic query
from maxvals;
The dynamic part here is the call to query_to_xml()
First I use format() to properly deal with identifiers. It also makes writing the SQL easier as no concatenation is required. So for every table returned by the first CTE, something like this is executed:
query_to_xml('select max(id) from public.some_table', true, true, '')
This returns something like:
<row xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<max>42</max>
</row>
The value is than extracted from the XML value using xpath() and converted to a number which then is used in the final SELECT to actually call setval()
The nesting with multiple CTEs is only used to make each part more readable.
The same approach can e.g. used to find the row count for all tables
Some background on how query_to_xml() works