Getting Partition_Info in athena - amazon-athena

SELECT *
FROM information_schema.__internal_partitions__
WHERE table_schema = 's1'
AND table_name = 't1'
ORDER BY partition_numberenter code here
Can we get partition details for all the tables at the same time using the above query ? Like where table_schema IN (s1,s2) AND table_name IN (t1,t2)

Related

Getting table names and row counts for all tables in an athena database

I have an AWS database with multiple tables that I am trying to get the row counts for in a single query.
The ideal query output would be:
table_name row_count
table2_name row_count
etc...
So far I've been able to either get all the table names from the database or all the rowcounts of the tables (in random order), but not both in the same query.
This query returns a column of all the table names that exist in the database:
SELECT table_name FROM information_schema.tables WHERE table_schema = '<database_name>';
This query returns all the row counts for the tables:
SELECT COUNT(*) FROM table_name
UNION ALL
SELECT COUNT(*) FROM table2_name
UNION ALL
etc..for the rest of the tables
The issue with this query is that is displays the row counts in a random order that doesn't correspond with the order of the tables in the query, and so I don't know which row count goes with which table - hence why I need both the table names and row counts.
Simply add the names of the tables as literals in your queries:
SELECT 'table_name' AS table_name, COUNT(*) AS row_count FROM table_name
UNION ALL
SELECT 'table_name2' AS table_name, COUNT(*) AS row_count FROM table_name2
UNION ALL
…
The following query generates the UNION query to produce counts of all records.
The problem to solve is that (as of December 2022) INFORMATION_SCHEMA.TABLES incorrectly defines every table and view as a BASE TABLE so you will need some logic to eliminate the views.
In Data Warehousing it is common practise to record snapshots of the record counts of landing tables at frequent intervals. Any unexpected deviations from expected counts can be used for reporting/alerting
WITH Table_List AS (
SELECT table_schema,table_name, CONCAT('SELECT CURRENT_DATE AS run_date, ''',table_name, ''' AS table_name, COUNT(*) AS Records FROM "',table_schema,'"."', table_name, '"') AS BaseSQL
FROM INFORMATION_SCHEMA.TABLES
WHERE
table_schema = 'YOUR_DB_NAME' -- Change this
AND table_name LIKE 'YOUR TABLE PATTERN%' -- Change or remove this line
)
, Total_Records AS (
SELECT COUNT(*) AS Table_Count
FROM Table_List
)
SELECT
CASE WHEN ROW_NUMBER() OVER (ORDER BY table_name) = Table_Count
THEN BaseSQL
ELSE CONCAT(BaseSql, ' UNION ALL') END AS All_Table_Record_count_SQL
FROM Table_List CROSS JOIN Total_Records
ORDER BY table_name;

Athena: Queries of this type are not supported

I have the current query in athena.
SELECT col1,
col_2,
A.col_3
FROM
(SELECT col_1,
col_3
FROM table_1
JOIN col_3
WHERE col_1 IN
(SELECT DISTINCT col_1
FROM table_2
JOIN table_1
ON table_1.col_1 = table_2.col_1
)
) AS A
LEFT JOIN
(SELECT col_2,
col_3
FROM table_3
JOIN col_3
WHERE col_2 IN
(SELECT DISTINCT col_2
FROM table_2
JOIN table_4
ON table_2.col_1 = table_4.col_1
JOIN table_3
ON table_4.col_2 = table_3.col_2
)
) AS B
ON B.col_3 = A.col_3
Which works in SQLite.
But when I run it in AWS Athena I got the following error:
Queries of this type are not supported (Service: AmazonAthena; Status Code: 400; Error Code: InvalidRequestException; Request ID: some_id)
I assume that some part of this query is not supported by AWS Athena, but I am new to the Framework.
"Queries of this type are not supported" is Athena's generic way of saying that it doesn't understand your SQL, but that it's not a simple syntax error. You're using SQL that Athena does not support, in other words.
Run the innermost part of the query by itself, and if you don't get the error, add the SQL that wraps it, and so on until you find the fragment that causes the error. If you don't know how to fix it ask a new question focused on that.

Unable to access Information_schema via stored procedure

I am writing a stored procedure which does involving using information schema in AWS Redshift.
VARIABLES FOR ALL THE Below -
Using same user
Using same Redshift database (endpoint)
Stored procedure:
create or replace procedure dev.gp_information_schema_test
(tablename varchar(64))
as $$
declare
table_name varchar(64);
schema_name varchar(64);
counts int;
begin
table_name := split_part(tablename,'.',1);
schema_name:= split_part(tablename,'.',2);
raise info 'table_name - %,Schema_name - %',table_name,schema_name;
counts := (select count(*) from information_schema.tables where table_schema = schema_name);
raise info 'count is -%',counts;
end;
$$
language plpgsql
call dev.gp_information_schema_test('dev.abc');
Result :
Warnings:
table_name - dev,Schema_name - abc
count is -0
0 rows affected
call executed successfully
Execution time: 0.55s
But if I run the same query outside (i.e not via stored procedure), then:
select count(*)
from information_schema.tables
where table_schema = 'dev'
Results:
I have already read the limitations of stored procedure in the AWS documentation (Link), but there is no mention of access restriction to system tables.
You can't set the value of a variable from a query using :=. Instead you need to use the SELECT INTO variable form. https://docs.aws.amazon.com/redshift/latest/dg/c_PLpgSQL-structure.html
Try this SP:
CREATE OR REPLACE PROCEDURE gp_information_schema_test(tablename VARCHAR(64)) AS
$$
DECLARE
table_name VARCHAR(64);
schema_name VARCHAR(64);
counts INT;
BEGIN
schema_name := split_part(tablename, '.', 1);
table_name := split_part(tablename, '.', 2);
RAISE INFO 'table_name - % , Schema_name - %',table_name,schema_name;
counts := (SELECT count(*) FROM information_schema.tables WHERE table_schema = schema_name);
RAISE INFO 'Tables in schema: %',counts;
END;
$$ LANGUAGE plpgsql;
Call:
CALL gp_information_schema_test('dev.abc');

If exists many tables

This may sound stupid, but I would like to know if there is a way to verify if a list of tables exists before doing an action. If I have 12 tables to verify, do I have to repeat “If exists bla bla bla” 12 times?
I tried doing …
IF EXISTS (SELECT * FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_NAME = ’employee_id’
and TABLE_NAME = N’employee_address’
and TABLE_NAME = N’employee_division’ )
But it not working. Any idea?
Your statement will never be true, because table name cannot be one of the these AND at the same time be another of them. In such scenario you can use OR instead of AND, but this will check does at least one of these tables exists. You must either combine 12 EXISTS statements with the names of the tables with AND conditions, or select the count of rows where table name is in list, i.e.:
IF EXISTS (SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME = 'employee_id')
and EXISTS(SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME = N'employee_address'
and EXISTS(SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME = N'employee_division' )
or
IF ((SELECT COUNT(*) FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME in ('employee_id', N'employee_address', N'employee_division' )) = 3)

Sybase 'select count' not showing up properly, trying to compare two tables

I'm doing a count from table1 whose records/rows don't exist in table2
Here is the query:
select count(1) from table1
where not exists (select 1 from table2 where
table1.col1 = table2.col1
and table2.id=1)
I need to see the records that are missing in table2 , whose id in table2=1, and these records should be available in table1. The PK here is col1.
The query returns me 0. But if I do an excel sheet comparing by removing both the tables to excel. I can find 1591 records that are missing from table1 and are available in table2.
Your query is working fine.
You query finds records that EXISTS in table1 but not in table2
You have found with excel records that does NOT EXISTS in table1 and EXISTS in table2
If you'd like to find these records with SQL than your query should be:
select count(1) from table2
where table2.id=1 and table2.col1 not in (select col1 from table1)
or with not exists version of this query:
select count(1) from table2
where table2.id=1 and
not exists (select 1 from table1 where table1.col1=table2.col1)
I didn't test the queries.