SQLite: SELECT IN by a 100K element list

SQLite: SELECT IN by a 100K element list - c++

I have an SQLite table of ~1M rows. Each row has a structure of (docId, docBLOB). Each docBlob is nearly 20Kb.
I have to perform SELECT by an externally provided list of docIDs. Each list may be nearly 100K elements long. How can I do it more efficiently?
Maybe there is a way to make SELECT * IN docBlobTable WHERE docId IN ( [MEGALIST] ) statement?

Put all the IDs into a temporary table, then use:
SELECT * FROM docBlobTable WHERE docId IN (SELECT ID FROM TempTable)
or:
SELECT docBlobTable.*
FROM docBlobTable
JOIN TempTable ON docBlobTable.docId = TempTable.ID

Related

How to create a new table with nested data in big query from another tables?

I found lots of example to create nested data in google bigquery manual but there is no example to do this from another tables.
I want to create a new table (for example solar_system_moons_nested) with nested data (write SQL statement to generate the nested data) using two existing tables (for example planets and moons tables). I want the new table look as follows:
I create the moon and planet tables as below:
moon table
planet table:
Is there anyway to create a nested table from existing tables? any help would be appreciated.
Here is how I made the new table(as below):
WITH solar_system_moons_nested AS (
SELECT p.planet,
STRUCT(moon ,Distance_from_Planet__km_,Diameter__km_) AS moons,
from test.planets p inner join test.moons m on m.planet=p.planet
)
select * from solar_system_moons_nested
and here is how it look like:
As you see, the select did not do what I expected.

If all you want is a nested structure, you can use array_agg and do something like below
WITH solar_system_moons_nested AS (
SELECT p.planet,
ARRAY_AGG(STRUCT(moon ,Distance_from_Planet__km_,Diameter__km_)) AS moons,
from test.planets p inner join test.moons m on m.planet=p.planet
GROUP BY 1
)
select * from solar_system_moons_nested
More on array_agg here https://cloud.google.com/bigquery/docs/reference/standard-sql/aggregate_functions

Use array_agg to construct an array:
WITH solar_system_moons_nested AS (
SELECT
p.planet,
array_agg(STRUCT(moon ,Distance_from_Planet__km_,Diameter__km_)) AS moons,
from test.planets p inner join test.moons m on m.planet=p.planet
group by p.planet
)
select * from solar_system_moons_nested

Sqlite Query to remove duplicates from one column. Removal depends on the second column

Please have a look at the following data example:
In this table, I have multiple columns. There is no PRIMARY KEY, as per the image I attached, there are a few duplicates in STK_CODE. Depending on the (min) column, I want to remove duplicate rows.
According to the image, one stk_code has three different rows. Corresponding to these duplicate stk_codes, value in (min) column is different, I want to keep the row which has minimum value in (min) column.
I am very new at sqlite and I am dealing with (-lsqlite3) to join cpp with sqlite.
Is there any way possible?

Your table has rowid as primary key.
Use it to get the rowids that you don't want to delete:
DELETE FROM comparison
WHERE rowid NOT IN (
SELECT rowid
FROM comparison
GROUP BY STK_CODE
HAVING (COUNT(*) = 1 OR MIN(CASE WHEN min > 0 THEN min END))
)
This code uses rowid as a bare column and a documented feature of SQLite with which when you use MIN() or MAX() aggregate functions the query returns that row which contains the min or max value.
See a simplified demo.

Remove duplicates based on sort

I have a customers table with ID's and some datetime columns. But those ID's have duplicates and i just want to Analyse distinct ID values.
I tried using groupby but this makes the process very slow.
Due to data sensitivity can't share it.
Any suggestions would be helpful.

I'd suggest using ROW_NUMBER() This lets you rank the rows by chosen columns and you can then pick out the first result.
Given you've shared no data or table and column names here's an example based on the Adventureworks database. The technique will be the same, you partition by whatever makes the group of rows you want to deduplicate unique (ProductKey below) and order in a way that makes the version you want to keep first (Children, birthdate and customerkey in my example).
USE AdventureWorksDW2017;
WITH CustomersOrdered AS
(
SELECT S.ProductKey, C.CustomerKey, C.TotalChildren, C.BirthDate
, ROW_NUMBER() OVER (
PARTITION BY S.ProductKey
ORDER BY C.TotalChildren DESC, C.BirthDate DESC, C.CustomerKey ASC
) AS CustomerSequence
FROM dbo.FactInternetSales AS S
INNER JOIN dbo.DimCustomer AS C
ON S.CustomerKey = C.CustomerKey
)
SELECT ProductKey, CustomerKey
FROM CustomersOrdered
WHERE CustomerSequence = 1
ORDER BY ProductKey, CustomerKey;

you can also just sort the columns with date column an than click on id column and delete duplicates...

How to substitute NULL with value in Power BI when joining one to many

In my model I have table AssignedToUser that don't contain any NULL values.
Table TotalCounts contains number of tasks for each User.
Those two table joined on UserGUID, and table TotalCounts contains NULL value for UserGUID.
When I drop everything in one table there is NULL value for AssignedToUser.
How can I substitute value NULL for AssignedToUser for "POOL".
Under EditQuery I tried to Create additional column
if [AssignedToUser] = null then "POOL" else [AssignedToUser]
But that didnt help.
UPDATE:
Thanks Alexis.
I have created FullAssignedToUsers table, but when I try to make a relationship with TotalCounts on UserGUID - it doesnt like it.
Data in new a table looks like this:
UPDATE:
File .ipbx can be accessed here:
https://www.dropbox.com/s/95frggpaq6tce7q/User%20Open%20Closed%20Tasks%20Experiment.pbix?dl=0

I believe the problem here is that your join has UserGUID values that are not in your AssignedToUsers table.
To correct this, one possibility is to replace your AssignedToUsers table with one that contains all the UserGUID values from the TotalCounts table.
FullAssignedToUsers =
ADDCOLUMNS(VALUES(TotalCounts[UserGUID]),
"AssignedToUser",
LOOKUPVALUE(AssignedToUsers[AssignedToUser],
AssignedToUsers[UserGUID], TotalCounts[UserGUID]))
The should get you the full outer join. You can then create the custom column like you described in the table and use that column in your visual.
You'll probably want to break the relationships with the original AssignedToUsers table and create relationships with the new one instead.
If you don't want to take that extra step, you can do an ISBLANK inside your new table formula.
FullAssignedToUsers =
ADDCOLUMNS(VALUES(TotalCounts[UserGUID]),
"AssignedToUser",
IF(
ISBLANK(
LOOKUPVALUE(AssignedToUsers[AssignedToUser],
AssignedToUsers[UserGUID], TotalCounts[UserGUID])),
"POOL",
LOOKUPVALUE(AssignedToUsers[AssignedToUser],
AssignedToUsers[UserGUID], TotalCounts[UserGUID])))
Note: This is equivalent to doing a right outer join merge on the AssignedToUsers table in the query editor and then replacing the nulls with "POOL". I'd actually recommend approaching it that way instead.
Another way to approach it is to pull the AssignedToUser column over to the TotalCounts table in a custom column and use that column in your visual instead.
AssignedToUsers =
IF(ISBLANK(RELATED(AssignedToUsers[AssignedToUser])),
"POOL",
RELATED(AssignedToUsers[AssignedToUser]))
This is equivalent to doing a left outer join merge on the TotalCounts table in the query editor, expanding the AssignedToUser column, then replacing nulls with "POOL" in that column.

In Dax missing values are Blank() not null. Try this:
=IF(
ISBLANK(AssignedToUsers[AssignedToUser]),
"Pool",
AssignedToUsers[AssignedToUser]
)

Remove rows from SQL DB that appear in a Array

I Develop with MFC Visual C++ and Oracle SQL Server.
I have SQL table with: IDs, value and time, when the application insert a new row: some ID, some Value and time being inserted.
My goal is to delete rows of values that were changed between certain time. since the data that was inserted during that time has incorrect value.
Where is the catch ? I dont need to delete all the rows that were updated in that time period, only the rows with IDs that appear on a certain CArray.
I can go through each ID from CArray and execute a delete query to that certain ID in that time period (whether there is entry or not) - problem since i can have 150K IDs to iterate
on..
Thanks

DELETE FROM table-name WHERE id in (...)

transform your array into a tempTable with one column and then delete from your destiantion table where ID in (select Id from temptable)
Here is an example:
declare #RegionID varchar(50)
SET #RegionID = '853,834,16,467,841'
declare #S varchar(20)
if LEN(#RegionID) > 0 SET #RegionID = #RegionID + ','
CREATE TABLE #ARRAY(region_ID VARCHAR(20))
WHILE LEN(#RegionID) > 0 BEGIN
SELECT #S = LTRIM(SUBSTRING(#RegionID, 1, CHARINDEX(',', #RegionID) - 1))
INSERT INTO #ARRAY (region_ID) VALUES (#S)
SELECT #RegionID = SUBSTRING(#RegionID, CHARINDEX(',', #RegionID) + 1, LEN(#RegionID))
END
delete from from your_table
where regionID IN (select region_ID from #ARRAY)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

SQLite: SELECT IN by a 100K element list - c++

Put all the IDs into a temporary table, then use: SELECT * FROM docBlobTable WHERE docId IN (SELECT ID FROM TempTable) or: SELECT docBlobTable.* FROM docBlobTable JOIN TempTable ON docBlobTable.docId = TempTable.ID

Related

How to create a new table with nested data in big query from another tables?

Sqlite Query to remove duplicates from one column. Removal depends on the second column

Remove duplicates based on sort

How to substitute NULL with value in Power BI when joining one to many

Remove rows from SQL DB that appear in a Array

Categories

Resources