cube.js: subquery in FROM clause - cube.js

How to do a subquery in the FROM clause in cube.js. There is only one table.
SELECT col_a FROM
(SELECT col_a, col_b FROM table_name WHERE col_c=some_val ORDER BY time DESC)
WHERE col_b=some_other_value

if you are in the same Table you could define a cube.js segment in your schema then specify the segment you are querying in your JSON Query. Refer this page to in the Docs.

Related

Getting table names and row counts for all tables in an athena database

I have an AWS database with multiple tables that I am trying to get the row counts for in a single query.
The ideal query output would be:
table_name row_count
table2_name row_count
etc...
So far I've been able to either get all the table names from the database or all the rowcounts of the tables (in random order), but not both in the same query.
This query returns a column of all the table names that exist in the database:
SELECT table_name FROM information_schema.tables WHERE table_schema = '<database_name>';
This query returns all the row counts for the tables:
SELECT COUNT(*) FROM table_name
UNION ALL
SELECT COUNT(*) FROM table2_name
UNION ALL
etc..for the rest of the tables
The issue with this query is that is displays the row counts in a random order that doesn't correspond with the order of the tables in the query, and so I don't know which row count goes with which table - hence why I need both the table names and row counts.
Simply add the names of the tables as literals in your queries:
SELECT 'table_name' AS table_name, COUNT(*) AS row_count FROM table_name
UNION ALL
SELECT 'table_name2' AS table_name, COUNT(*) AS row_count FROM table_name2
UNION ALL
…
The following query generates the UNION query to produce counts of all records.
The problem to solve is that (as of December 2022) INFORMATION_SCHEMA.TABLES incorrectly defines every table and view as a BASE TABLE so you will need some logic to eliminate the views.
In Data Warehousing it is common practise to record snapshots of the record counts of landing tables at frequent intervals. Any unexpected deviations from expected counts can be used for reporting/alerting
WITH Table_List AS (
SELECT table_schema,table_name, CONCAT('SELECT CURRENT_DATE AS run_date, ''',table_name, ''' AS table_name, COUNT(*) AS Records FROM "',table_schema,'"."', table_name, '"') AS BaseSQL
FROM INFORMATION_SCHEMA.TABLES
WHERE
table_schema = 'YOUR_DB_NAME' -- Change this
AND table_name LIKE 'YOUR TABLE PATTERN%' -- Change or remove this line
)
, Total_Records AS (
SELECT COUNT(*) AS Table_Count
FROM Table_List
)
SELECT
CASE WHEN ROW_NUMBER() OVER (ORDER BY table_name) = Table_Count
THEN BaseSQL
ELSE CONCAT(BaseSql, ' UNION ALL') END AS All_Table_Record_count_SQL
FROM Table_List CROSS JOIN Total_Records
ORDER BY table_name;

Calculated Column based on other table. Direct Query sql

I have 2 tables sourced by direct query to sql.
Table1 contains 3 columns "Fruit", "Number", and "Date".
Table2 contains 2 columns "Country", and "Fruit".
Table2 is linked to Table1 with a 1->*(Many) link from Table2[Fruit] to Table1[Fruit].
I want to create a new column in Table2, containing the average of "Number" for a specified range of dates.
Any ideas of how this can be done?
Add a measure like AverageNumber = SUM(Table1[Number])/COUNT(Table1[Date]). The date range comes automatically from your filters/slicers, and the average will show correctly for a particular fruit due to the relationship you have added.

Is it possible to remove a column from a partitioned table in Google BigQuery?

I'm trying to remove a column from a partitioned table in BigQuery using this command
bq query --destination_table [DATASET].[TABLE_NAME] --replace --use_legacy_sql=false 'SELECT * EXCEPT(column) FROM [DATASET].[TABLE_NAME]'
As a result the unwanted column is removed, the schema is changed but the data is no more partitioned.
Any suggestion on how to keep the data partitioned after the column is removed? Docs are clear only for non partitioned tables.
There are two workarounds you can use:
Use a column-partitioning table, which means it's partitioned on a value of a regular column in a table. You can create a new column-partitioned table and copy the data deleting the column:
bq mk --time_partitioning_field=pt --schema=... [DATASET].[TABLE_NAME2]
bq query --destination_table=[DATASET].[TABLE_NAME2] "SELECT _PARTITIONTIME as pt, * EXCEPT(column) from [DATASET].[TABLE_NAME]"
You can also still use day-partitioned tables, but copy the data using DML. You can set or copy _PARTITIONTIME column inside the DML INSERT statement, which is not possible with regular SELECT. Here is an example:
INSERT INTO
dataset1.table1 (_partitiontime,
a,
b)
SELECT
TIMESTAMP(DATE "2008-12-25") AS _partitiontime,
"a" AS a,
"b" AS b
This requires DML over partitioned tables, which is currently in alpha: https://issuetracker.google.com/issues/36383555
BigQuery now supports DROP COLUMN in partitioned tables:
ALTER TABLE mydataset.mytable
DROP COLUMN column
It's in beta at the time of writing, but it worked for me.

Update variable based on match in two tables

I have 2 tables, lets name them table1 and table2. Both of them have credit_id, loan_id and Date field. For some reason credit_id field needs to be updated with corresponding values from table2, linking data by Date and loan_id fields. To do so, I made a query like:
proc sql;
UPDATE a
SET a.credit_id = b.credit_id
FROM table1 a, table2 b
WHERE (a.Date = b.Date) AND (a.loan_id = b.loan_id);
quit;
According to googling, this query should work in many sql environments, but it seems that SAS is an exception here, because it seems that from part is ignored.
How to update needed field then?
I can't comment on the SQL, but you can do the same thing using a data step:
data table1;
update table1 table2(keep = date loan_id credit_id);
by date loan_id;
run;
This requires that:
No two rows in the same table have the same date and loan_id, and
Both tables are sorted/indexed by date and loan id
You need the keep on the transaction dataset in order to prevent it from updating/creating any other variables on the master dataset. There are also several other ways you could do this, e.g. using the modify or merge statements.

Sybase 'select count' not showing up properly, trying to compare two tables

I'm doing a count from table1 whose records/rows don't exist in table2
Here is the query:
select count(1) from table1
where not exists (select 1 from table2 where
table1.col1 = table2.col1
and table2.id=1)
I need to see the records that are missing in table2 , whose id in table2=1, and these records should be available in table1. The PK here is col1.
The query returns me 0. But if I do an excel sheet comparing by removing both the tables to excel. I can find 1591 records that are missing from table1 and are available in table2.
Your query is working fine.
You query finds records that EXISTS in table1 but not in table2
You have found with excel records that does NOT EXISTS in table1 and EXISTS in table2
If you'd like to find these records with SQL than your query should be:
select count(1) from table2
where table2.id=1 and table2.col1 not in (select col1 from table1)
or with not exists version of this query:
select count(1) from table2
where table2.id=1 and
not exists (select 1 from table1 where table1.col1=table2.col1)
I didn't test the queries.