How the pricing is calculated If I query a Bigquery View? - google-cloud-platform

Say I have a BigQuery View
MyView:
select col1, col2, col3, col4, col5, col6, col7 from mytable;
Now If I Query my view:
select col1 from MyView;
So In this case , will the pricing will be calculated for all columns or only for col1?

will the pricing will be calculated for all columns or only for col1?
Only for col1!
You can easily check this in UI by looking into estimation of how many bytes will be processed using
select col1 from MyView
vs
select * from MyView

Related

How to add column with query folding using snowflake connector

I am trying to add a new column to a power query result that is the result of subtracting one column from another. according to the power bi documentation basic arithmetic is supported with query folding but for some reason it is showing a failure to query fold. I also tried simply adding a column populated with the number 1 and it still was not working. Is there some trick to getting query folding a new column to work on snowflake?
If the computation is made based only on data from source, then it could be computed during table import as SQL Statement:
SELECT col1, col2, col1 + col2 AS computed_total
FROM my_table_name
EDIT:
The problem with this solution is that native SQL statement for snowflake is only supported on PBI desktop and I want to have this stored in a dataflow (so pbi web client) for reusability and other reasons.
Option 1:
Create a view istead of table at source:
CREATE OR REPLACE VIEW my_view
AS
SELECT col1, col2, col1 + col2 AS computed_total
FROM my_table_name;
Option 2:
Add computed column to the table:
ALTER TABLE my_table_name
ADD COLUMN computed_total NUMBER(38,4) AS (col1 + col2);

Getting table names and row counts for all tables in an athena database

I have an AWS database with multiple tables that I am trying to get the row counts for in a single query.
The ideal query output would be:
table_name row_count
table2_name row_count
etc...
So far I've been able to either get all the table names from the database or all the rowcounts of the tables (in random order), but not both in the same query.
This query returns a column of all the table names that exist in the database:
SELECT table_name FROM information_schema.tables WHERE table_schema = '<database_name>';
This query returns all the row counts for the tables:
SELECT COUNT(*) FROM table_name
UNION ALL
SELECT COUNT(*) FROM table2_name
UNION ALL
etc..for the rest of the tables
The issue with this query is that is displays the row counts in a random order that doesn't correspond with the order of the tables in the query, and so I don't know which row count goes with which table - hence why I need both the table names and row counts.
Simply add the names of the tables as literals in your queries:
SELECT 'table_name' AS table_name, COUNT(*) AS row_count FROM table_name
UNION ALL
SELECT 'table_name2' AS table_name, COUNT(*) AS row_count FROM table_name2
UNION ALL
…
The following query generates the UNION query to produce counts of all records.
The problem to solve is that (as of December 2022) INFORMATION_SCHEMA.TABLES incorrectly defines every table and view as a BASE TABLE so you will need some logic to eliminate the views.
In Data Warehousing it is common practise to record snapshots of the record counts of landing tables at frequent intervals. Any unexpected deviations from expected counts can be used for reporting/alerting
WITH Table_List AS (
SELECT table_schema,table_name, CONCAT('SELECT CURRENT_DATE AS run_date, ''',table_name, ''' AS table_name, COUNT(*) AS Records FROM "',table_schema,'"."', table_name, '"') AS BaseSQL
FROM INFORMATION_SCHEMA.TABLES
WHERE
table_schema = 'YOUR_DB_NAME' -- Change this
AND table_name LIKE 'YOUR TABLE PATTERN%' -- Change or remove this line
)
, Total_Records AS (
SELECT COUNT(*) AS Table_Count
FROM Table_List
)
SELECT
CASE WHEN ROW_NUMBER() OVER (ORDER BY table_name) = Table_Count
THEN BaseSQL
ELSE CONCAT(BaseSql, ' UNION ALL') END AS All_Table_Record_count_SQL
FROM Table_List CROSS JOIN Total_Records
ORDER BY table_name;

Django ORM make Union Query with column not in common in both tables, set value of not in common column as null

Hi i want to make a query in Djang ORM
like this
Select Col1, Col2, Col3, Col4, Col5 from Table1
Union
Select Col1, Col2, Col3, Null as Col4, Null as Col5 from Table2
as you see Col4, Col5 are not in common but they will return null instead in Table2.
Table1_qs = Table1.objects.all()
Table2_qs = Table2.objects.all()
Table1_qs.values('Col1', 'Col2','Col3','Col4','Col5').union(Table2_qs.values('Col1', 'Col2','Col3','Null as Col4','Null as Col5'))
How can i make the query in Django?
the solution is made possible by Value and annotate.
here is how.
let say Col4 is type IntegerField,
and Col5 is type CharField
from django.db.models import Value, IntegerField, CharField
Table1_qs = Table1.objects.all()
Table2_qs = Table2.objects.all()
Table1_qs = Table1_qs.values('Col1', 'Col2','Col3','Col4','Col5')
Table2_qs = Table2_qs.values('Col1', 'Col2','Col3').annotate(
Col4=Value(None, output_field=IntegerField()),
Col5=Value(None, output_field=CharField()) )
unioned_query = Table1_qs.union(Table2_qs)
please note:
1: columns type must be the same as each.
2: and they must be in same order as well.
the problem that arise is within foreign-key. as only the id (primary key) of them will be returned when using Values() on a query-set!
I hope Django add a way to get them as usual objects too.

How to update multiple columns in same update statement with one column depends upon another new column new value in Redshift

I want to update multiple columns in same update statement with one column depends upon another new column new value.
Example:
Sample Data: col1 and col2 is the column names and test_update is the table name.
SELECT * FROM test_update;
col1 col2
col-1 col-2
col-1 col-2
col-1 col-2
update test_update set col1 = 'new', col2=col1||'-new';
SELECT * FROM test_update;
col1 col2
new col-1-new
new col-1-new
new col-1-new
What I need to achieve is col2 is updated as new-new as we updated value of col1 is new.
I think may be its not possible in one SQL statement. If possible How can we do that, If its not What is best way of handling this problem in Data Warehouse environment, like execute multiple update 1st on col1 and then on col2 or any other.
Hoping my question is clear.
You cannot update the second column based on the result of updating the first column. However this can be achieved in a single by "pre-calculating" the result you want and then updating based on that.
The following update using a join is based on the example provided in the Redshift documentation:
UPDATE test_update
SET col1 = precalc.col1
, col2 = precalc.col2
FROM (
SELECT catid
, 'new' AS col1
, col1 || '-new' AS col2
FROM test_update
) precalc
WHERE test_update.id = precalc.id;
;

How to do an importrange query with a time duration condition (all rows under 1 minute)

I'm trying to make a query(importrange) of this google-sheet file
I want to filter my data based on 3 conditions:
Col5='GC' OR
Col5='CL' AND(this is the problem I can not solve)
In Col4 the time must be under 60 seconds.
I've tried different solutions (time, seconds, timevalue) but none of them works.
I tried this but it's WITHOUT the last, crucial passage:
=query(IMPORTRANGE("18OOzibH9rmuzNxOPo_EbZ1rhF32qESuvPa4x4pB1BmA/edit#gid=0",
"data!A1:Q"),
"select Col1, Col2, Col3, Col4, Col5, Col6, Col7, Col8, Col9, Col10, Col11, Col12, Col13, Col14, Col15, Col16, Col17, WHERE (Col5='GC') OR (Col5='CL')"
)
The result I am expecting to see is to have only the rows with GC and CL in Col5 and a duration <= 60 seconds.
=QUERY(IMPORTRANGE("18OOzibH9rmuzNxOPo_EbZ1rhF32qESuvPa4x4pB1BmA", "data!A1:Q"),
"where Col5 matches 'CL|GC'
and minute(Col4) < 1", 1)