BigQuery - Join take few hours in 3TB table

BigQuery - Join take few hours in 3TB table - google-cloud-platform

I have 2 tables in BQ.
Table ID: blog
Table size: 3.07 TB
Table ID: llog
Table size: 259.82 GB
Im running the below query and it took few hours(even its not finished, I killed it, so not able to capture the query plan).
select bl.row.iddi as bl_iddi, count(lg.row.id) as count_location_ping
from
`poc_dataset.blog` bl LEFT OUTER JOIN `poc_dataset.llog` lg
ON
bl.row.iddi = lg.row.iddi
where bl.row.iddi="124623832432"
group by bl.row.iddi
Im not sure how to optimize this. Blog table has trillions of rows.

Unless some extra details are missing in your question - below should give you expected result
#standardSQL
SELECT row.iddi AS iddi, COUNT(row.id) AS count_location_ping
FROM `poc_dataset.llog`
WHERE row.iddi= '124623832432'
GROUP BY row.iddi

Related

Not able to insert records from one table to another table using Pyspark SQL in AWS Glue job

I am unable to insert records from one table to another table using Pyspark SQL in AWS Glue job. Its not prompting any error as well..
Table1 name : details_info
spark.sql('select count(*) from details_info').show();
count: 50000
Table2 name : goals_info
spark.sql('select count(*) from goals_info').show();
count: 0
I am trying to insert data from "details_info" table to "goals_info" table with below querie
spark.sql("INSERT INTO goals_info SELECT * FROM details_info")
I am expecting the count of goals_info as 50000 but it is showing me as 0 after
executing above sql statement
spark.sql('select count(*) from goals_info').show();
count: 0
code block is executing with out throwing any error but data is not inserting as count is showing as 0. Could anybody help me understand what might be the reason?
I even tried with write.insertInto() pyspark method , but the count is still showing a zero
view_df = spark.table("details_info")
view_df.write.insertInto("goals_info")

Explode a table with a monthly increment in Amazon Redshift

I have a sample table:
id
start_dt
end_dt
100
06/07/2021
30/09/2021
I would like to get the following output
id
start_dt
end_dt
100
06/07/2021
31/07/2021
100
01/08/2021
30/08/2021
100
01/09/2021
30/09/2021
I have tried using GENERATE_SERIES() in Amazon Redshift, but that does not give the required result.
The existing table is quite large so I could use temp tables then join back to another table at a later stage.
I have trawled through other posts, but other proposed solutions isn't quite giving the desired results / don't work at all on Amazon Redshift. Any help in solving this would be appreciated.

The traditional method would be:
Create a Calendar table that contains one row per month, with start_date and end_date columns
Join your table to the Calendar table, where table.start_dt <= calendar.end_dt AND table.end_dt >= calendar.start_dt
The two columns would be:
GREATEST(table.start_dt, calendar.start_dt)
LEAST(table.end_dt, calendar.end_dt)

How to calculate gap between 2 timestamps (edited for AWS Athena )

I Have many IOT devices that sends data to my Amazon Athena server, i created a table to store the data and the table contains 2 columns: LocalTime indicate the time that the IOT device capture his status, ServerTime indicate the time the Data arrived to server (sometimes the IOT device doesn't have network connections )
I would like to count the "gaps" in block of hours (let's say 1 hour ) in order to know the deviation of the data arriving, for example:
the result that I would like to get is:
In order to calculate the result i want to calculate how many hours passed between serverTime and LocalTime.
so the first entry (1.1.2019 12:15 - 1.1.2019 10:25 ) = 1-2 hours.
Thanks

If it is MSSQL Server is your database, you can try this below script to get your desired output-
SELECT
CAST(DATEDIFF(HH,localTime,serverTime)-1 AS VARCHAR) +'-'+
CAST(DATEDIFF(HH,localTime,serverTime) AS VARCHAR) [Hours],
COUNT(*) [Count]
FROM your_table
GROUP BY CAST(DATEDIFF(HH,localTime,serverTime)-1 AS VARCHAR) +'-'+
CAST(DATEDIFF(HH,localTime,serverTime) AS VARCHAR)

Oracle
If you using Oracle database as a system, you can use this statement:
select CONCAT(CONCAT (diff_hours,'-') , diff_hours+1) as Hours, count(diff_hours) as Count
from (select 24 * (to_date(LocalTime, 'YYYY-MM-DD hh24:mi') - to_date(ServerTime, 'YYYY-MM-DD hh24:mi')) diff_hours from T_TIMETABLE )
group by diff_hours
order by diff_hours;
Note: This will not display the empty intervals.

"Show Item With No Data" function is not working after making relationship between two table. How to Fix This?

Power BI Desktop - I have two tables with Unique value.
Table one contains - Target for Each region.
e.g. APAC - 50, NA - 100, Europe - 200
Table two contains - Sales achievement for each region.
e.g. NA - 70, Europe - 90
Now Problem is - APAC region is not listed in 2nd Table. i created a matrix table and showing both table value in one table through relationship but after making the relationship, APAC region is not showing in third table list because its not there in 2nd table.
So please suggest how can i fix this?
I tried with "Show Items with No data" but it is also not working.
I want the unique list of regions from table 1 & 2 in 3rd table irrespective of value assigned to them or not.

How about creating a third table from unique region-values with DAX?
Something like this:
UniquesTable = DISTINCT(UNION(VALUES(Table1[Region]),VALUES(Table2[Region])))
You should connect Table1 and Table2 to UniquesTable in the Relationships-tab so that calculations with targets and achievements work as expected

SQLite C++ Compare two tables within the same database for matching records

I want to be able to compare two tables within the same SQLite Database using a C++ interface for matching records. Here are my two tables
Table name : temptrigrams
ID TEMPTRIGRAM
---------- ----------
1 The cat ran
2 Compare two tables
3 Alex went home
4 Mark sat down
5 this database blows
6 data with a
7 table disco ninja
++78
Table Name: spamtrigrams
ID TRIGRAM
---------- ----------
1 Sam's nice ham
2 Tuesday was cold
3 Alex stood up
4 Mark passed out
5 this database is
6 date with a
7 disco stew pot
++10000
The first table has two columns and 85 records and the second table has two columns with 10007 records.
I would like to take the first table and compare the records within the TEMPTRIGRAM column and compare it against the TRIGRAM columun in the second table and return the number of matches across the tables. So if (ID:1 'The Cat Ran' appears in 'spamtrigrams', I would like that counted and returned with the total at the end as an integer.
Could somebody please explain the syntax for the query to perform this action?
Thank you.

This is a join query with an aggregation. My guess is that you want the number of matches per trigram:
select t1.temptrigram, count(t2.trigram)
from table1 t1 left outer join
table2 t2
on t1.temptrigram = t2.trigram
group by t1.temptrigram;
If you just want the number of matches:
select count(t2.trigram)
from table1 t1 join
table2 t2
on t1.temptrigram = t2.trigram;

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

BigQuery - Join take few hours in 3TB table - google-cloud-platform

Unless some extra details are missing in your question - below should give you expected result #standardSQL SELECT row.iddi AS iddi, COUNT(row.id) AS count_location_ping FROM `poc_dataset.llog` WHERE row.iddi= '124623832432' GROUP BY row.iddi

Related

Not able to insert records from one table to another table using Pyspark SQL in AWS Glue job

Explode a table with a monthly increment in Amazon Redshift

How to calculate gap between 2 timestamps (edited for AWS Athena )

"Show Item With No Data" function is not working after making relationship between two table. How to Fix This?

SQLite C++ Compare two tables within the same database for matching records

Categories

Resources