How to create a new table with nested data in big query from another tables? - google-cloud-platform

I found lots of example to create nested data in google bigquery manual but there is no example to do this from another tables.
I want to create a new table (for example solar_system_moons_nested) with nested data (write SQL statement to generate the nested data) using two existing tables (for example planets and moons tables). I want the new table look as follows:
I create the moon and planet tables as below:
moon table
planet table:
Is there anyway to create a nested table from existing tables? any help would be appreciated.
Here is how I made the new table(as below):
WITH solar_system_moons_nested AS (
SELECT p.planet,
STRUCT(moon ,Distance_from_Planet__km_,Diameter__km_) AS moons,
from test.planets p inner join test.moons m on m.planet=p.planet
)
select * from solar_system_moons_nested
and here is how it look like:
As you see, the select did not do what I expected.

If all you want is a nested structure, you can use array_agg and do something like below
WITH solar_system_moons_nested AS (
SELECT p.planet,
ARRAY_AGG(STRUCT(moon ,Distance_from_Planet__km_,Diameter__km_)) AS moons,
from test.planets p inner join test.moons m on m.planet=p.planet
GROUP BY 1
)
select * from solar_system_moons_nested
More on array_agg here https://cloud.google.com/bigquery/docs/reference/standard-sql/aggregate_functions

Use array_agg to construct an array:
WITH solar_system_moons_nested AS (
SELECT
p.planet,
array_agg(STRUCT(moon ,Distance_from_Planet__km_,Diameter__km_)) AS moons,
from test.planets p inner join test.moons m on m.planet=p.planet
group by p.planet
)
select * from solar_system_moons_nested

Related

Query for listing Datasets and Number of tables in Bigquery

So I'd like make a query that shows all the datasets from a project, and the number of tables in each one. My problem is with the number of tables.
Here is what I'm stuck with :
SELECT
smt.catalog_name as `Project`,
smt.schema_name as `DataSet`,
( SELECT
COUNT(*)
FROM ***DataSet***.INFORMATION_SCHEMA.TABLES
) as `nbTable`,
smt.creation_time,
smt.location
FROM
INFORMATION_SCHEMA.SCHEMATA smt
ORDER BY DataSet
The view INFORMATION_SCHEMA.SCHEMATA lists all the datasets from the project the query is executed, and the view INFORMATION_SCHEMA.TABLES lists all the tables from a given dataset.
The thing is that the view INFORMATION_SCHEMA.TABLES needs to have the dataset specified like this give the tables informations : dataset.INFORMATION_SCHEMA.TABLES
So what I need is to replace the *** DataSet*** by the one I got from the query itself (smt.schema_name).
I am not sure if I can do it with a sub query, but I don't really know how to manage to do it.
I hope I'm clear enough, thanks in advance if you can help.
You can do this using some procedural language as follows:
CREATE TEMP TABLE table_counts (dataset_id STRING, table_count INT64);
FOR record IN
(
SELECT
catalog_name as project_id,
schema_name as dataset_id
FROM `elzagales.INFORMATION_SCHEMA.SCHEMATA`
)
DO
EXECUTE IMMEDIATE
CONCAT("INSERT table_counts (dataset_id, table_count) SELECT table_schema as dataset_id, count(table_name) from ", record.dataset_id,".INFORMATION_SCHEMA.TABLES GROUP BY dataset_id");
END FOR;
SELECT * FROM table_counts;
This will return something like:

Syntax for using RELATED() function under IF Condition in POWER BI

I have two tables:
table 1:
Product LOB
BVPN NS
SD-WAN IS
QUICK START NS
BVPN SMALL OSBU
Table 2:
Product LOB
BVPN NS
SD-WAN IS
QUICK START NS
BVPN SMALL NS
I want to create a custom column that will change the value "OSBU" in LOB column of table1 to NS based on the value in LOB column of table2 and keep other values the same. I used the following code but it's not giving me the desired output. Can anyone tell what is wrong?
Column =
IF (
'table1'[LOB] = "OSBU",
RELATED ( 'table2'[LOB] ),
'table1'[GOLD_BILLING_PROFILE.Product/Service]
)
RELATED function works between tables with a relationship established only. You would have to create a relationship between Table1 and Table2 based on Product and hopefully it is a one to one mapping. The following link should give you the basic details on creating and managing a relationship:
https://learn.microsoft.com/en-us/power-bi/desktop-create-and-manage-relationships
Hope this helps.
Edit:
I don't know why you are using a different variable for the FALSE Condition. Ideally it should be something like:
Column =
IF (
'table1'[LOB] = "OSBU",
RELATED ( 'table2'[LOB] ),
'table1'[LOB]
)

How to substitute NULL with value in Power BI when joining one to many

In my model I have table AssignedToUser that don't contain any NULL values.
Table TotalCounts contains number of tasks for each User.
Those two table joined on UserGUID, and table TotalCounts contains NULL value for UserGUID.
When I drop everything in one table there is NULL value for AssignedToUser.
How can I substitute value NULL for AssignedToUser for "POOL".
Under EditQuery I tried to Create additional column
if [AssignedToUser] = null then "POOL" else [AssignedToUser]
But that didnt help.
UPDATE:
Thanks Alexis.
I have created FullAssignedToUsers table, but when I try to make a relationship with TotalCounts on UserGUID - it doesnt like it.
Data in new a table looks like this:
UPDATE:
File .ipbx can be accessed here:
https://www.dropbox.com/s/95frggpaq6tce7q/User%20Open%20Closed%20Tasks%20Experiment.pbix?dl=0
I believe the problem here is that your join has UserGUID values that are not in your AssignedToUsers table.
To correct this, one possibility is to replace your AssignedToUsers table with one that contains all the UserGUID values from the TotalCounts table.
FullAssignedToUsers =
ADDCOLUMNS(VALUES(TotalCounts[UserGUID]),
"AssignedToUser",
LOOKUPVALUE(AssignedToUsers[AssignedToUser],
AssignedToUsers[UserGUID], TotalCounts[UserGUID]))
The should get you the full outer join. You can then create the custom column like you described in the table and use that column in your visual.
You'll probably want to break the relationships with the original AssignedToUsers table and create relationships with the new one instead.
If you don't want to take that extra step, you can do an ISBLANK inside your new table formula.
FullAssignedToUsers =
ADDCOLUMNS(VALUES(TotalCounts[UserGUID]),
"AssignedToUser",
IF(
ISBLANK(
LOOKUPVALUE(AssignedToUsers[AssignedToUser],
AssignedToUsers[UserGUID], TotalCounts[UserGUID])),
"POOL",
LOOKUPVALUE(AssignedToUsers[AssignedToUser],
AssignedToUsers[UserGUID], TotalCounts[UserGUID])))
Note: This is equivalent to doing a right outer join merge on the AssignedToUsers table in the query editor and then replacing the nulls with "POOL". I'd actually recommend approaching it that way instead.
Another way to approach it is to pull the AssignedToUser column over to the TotalCounts table in a custom column and use that column in your visual instead.
AssignedToUsers =
IF(ISBLANK(RELATED(AssignedToUsers[AssignedToUser])),
"POOL",
RELATED(AssignedToUsers[AssignedToUser]))
This is equivalent to doing a left outer join merge on the TotalCounts table in the query editor, expanding the AssignedToUser column, then replacing nulls with "POOL" in that column.
In Dax missing values are Blank() not null. Try this:
=IF(
ISBLANK(AssignedToUsers[AssignedToUser]),
"Pool",
AssignedToUsers[AssignedToUser]
)

SQLite: SELECT IN by a 100K element list

I have an SQLite table of ~1M rows. Each row has a structure of (docId, docBLOB). Each docBlob is nearly 20Kb.
I have to perform SELECT by an externally provided list of docIDs. Each list may be nearly 100K elements long. How can I do it more efficiently?
Maybe there is a way to make SELECT * IN docBlobTable WHERE docId IN ( [MEGALIST] ) statement?
Put all the IDs into a temporary table, then use:
SELECT * FROM docBlobTable WHERE docId IN (SELECT ID FROM TempTable)
or:
SELECT docBlobTable.*
FROM docBlobTable
JOIN TempTable ON docBlobTable.docId = TempTable.ID

How to integrate a CTE query in Entity Framework 5

I have an SQL query that I have written using CTE. Now, I am moving the repository to use Entity Framework 5.
I am at a loss as to how to integrate (or rewrite) the CTE-based query using Entity Framework 5.
I am using POCO entities with the EF5 and have a bunch of Map classes. There is no EDMX file etc.
I feel like a total noob right now and would appreciate any help pointing me in the right direction.
The CTE query is as following
WITH CDE AS
(
SELECT * FROM collaboration.Workspace AS W WHERE W.Id = #WorkspaceId
UNION ALL
SELECT W.* FROM collaboration.Workspace AS W INNER JOIN CDE ON W.ParentId = CDE.Id AND W.ParentId <> '00000000-0000-0000-0000-000000000000'
)
SELECT
W.Id AS Id,
W.Name AS Name,
W.Description AS Description,
MAX(WH.ActionedTimeUtc) AS LastUpdatedTimeUtc,
WH.ActorId AS LastUpdateUserId
FROM
collaboration.Workspace AS W
INNER JOIN
collaboration.WorkspaceHistory AS WH ON W.Id = WH.WorkspaceId
INNER JOIN
(
SELECT TOP 10
CDE.Id
FROM
CDE
INNER JOIN
collaboration.WorkspaceHistory AS WH ON WH.WorkspaceId = CDE.Id
WHERE
CDE.Id <> #WorkspaceId
GROUP BY
CDE.Id,
CDE.ParentId,
WH.ActorId,
WH.Action
HAVING
WH.ActorId = #UserId
AND
WH.Action <> 4
ORDER BY
COUNT(*) DESC
) AS Q ON Q.Id = WH.WorkspaceId
GROUP BY
W.Id,
W.Name,
W.Description,
WH.ActorId
HAVING
WH.ActorId = #UserId
You must create stored procedure for your SQL query (or use that query directly) and execute it through dbContext.Database.SqlQuery. You are using code-first approach where you don't have any other options. In EDMX you could use mapped table valued function but code-first doesn't have such option yet.
I have built a stored procedure which takes array of ids as input parameter and return data table using recursive cte query Take a look at the code here it's using EF and code first approach