Remove rows from SQL DB that appear in a Array

Remove rows from SQL DB that appear in a Array - c++

I Develop with MFC Visual C++ and Oracle SQL Server.
I have SQL table with: IDs, value and time, when the application insert a new row: some ID, some Value and time being inserted.
My goal is to delete rows of values that were changed between certain time. since the data that was inserted during that time has incorrect value.
Where is the catch ? I dont need to delete all the rows that were updated in that time period, only the rows with IDs that appear on a certain CArray.
I can go through each ID from CArray and execute a delete query to that certain ID in that time period (whether there is entry or not) - problem since i can have 150K IDs to iterate
on..
Thanks

DELETE FROM table-name WHERE id in (...)

transform your array into a tempTable with one column and then delete from your destiantion table where ID in (select Id from temptable)
Here is an example:
declare #RegionID varchar(50)
SET #RegionID = '853,834,16,467,841'
declare #S varchar(20)
if LEN(#RegionID) > 0 SET #RegionID = #RegionID + ','
CREATE TABLE #ARRAY(region_ID VARCHAR(20))
WHILE LEN(#RegionID) > 0 BEGIN
SELECT #S = LTRIM(SUBSTRING(#RegionID, 1, CHARINDEX(',', #RegionID) - 1))
INSERT INTO #ARRAY (region_ID) VALUES (#S)
SELECT #RegionID = SUBSTRING(#RegionID, CHARINDEX(',', #RegionID) + 1, LEN(#RegionID))
END
delete from from your_table
where regionID IN (select region_ID from #ARRAY)

Related

Redshift Pivot Function

I've got a similar table which I'm trying to pivot in Redshift:
UUID
Key
Value
a123
Key1
Val1
b123
Key2
Val2
c123
Key3
Val3
Currently I'm using following code to pivot it and it works fine. However, when I replace the IN part with subquery it throws an error.
select *
from (select UUID ,"Key", value from tbl) PIVOT (max(value) for "key" in (
'Key1',
'Key2',
'Key3
))
Question: What's the best way to replace the IN part with sub query which takes distinct values from Key column?
What I am trying to achieve;
select *
from (select UUID ,"Key", value from tbl) PIVOT (max(value) for "key" in (
select distinct "keys" from tbl
))

From the Redshift documentation - "The PIVOT IN list values cannot be column references or sub-queries. Each value must be type compatible with the FOR column reference." See: https://docs.aws.amazon.com/redshift/latest/dg/r_FROM_clause-pivot-unpivot-examples.html
So I think this will need to be done as a sequence of 2 queries. You likely can do this in a stored procedure if you need it as a single command.
Updated with requested stored procedure with results to a cursor example:
In order to make this supportable by you I'll add some background info and description of how this works. First off a stored procedure cannot produce results strait to your bench. It can either store the results in a (temp) table or to a named cursor. A cursor is just storing the results of a query on the leader node where they wait to be fetched. The lifespan of the cursor is the current transaction so a commit or rollback will delete the cursor.
Here's what you want to happen as individual SQL statements but first lets set up the test data:
create table test (UUID varchar(16), Key varchar(16), Value varchar(16));
insert into test values
('a123', 'Key1', 'Val1'),
('b123', 'Key2', 'Val2'),
('c123', 'Key3', 'Val3');
The actions you want to perform are first to create a string for the PIVOT clause IN list like so:
select '\'' || listagg(distinct "key",'\',\'') || '\'' from test;
Then you want to take this string and insert it into your PIVOT query which should look like this:
select *
from (select UUID, "Key", value from test)
PIVOT (max(value) for "key" in ( 'Key1', 'Key2', 'Key3')
);
But doing this in the bench will mean taking the result of one query and copy/paste-ing into a second query and you want this to happen automatically. Unfortunately Redshift does allow sub-queries in PIVOT statement for the reason given above.
We can take the result of one query and use it to construct and run another query in a stored procedure. Here's such a store procedure:
CREATE OR REPLACE procedure pivot_on_all_keys(curs1 INOUT refcursor)
AS
$$
DECLARE
row record;
BEGIN
select into row '\'' || listagg(distinct "key",'\',\'') || '\'' as keys from test;
OPEN curs1 for EXECUTE 'select *
from (select UUID, "Key", value from test)
PIVOT (max(value) for "key" in ( ' || row.keys || ' )
);';
END;
$$ LANGUAGE plpgsql;
What this procedure does is define and populate a "record" (1 row of data) called "row" with the result of the query that produces the IN list. Next it opens a cursor, whose name is provided by the calling command, with the contents of the PIVOT query which uses the IN list from the record "row". Done.
When executed (by running call) this function will produce a cursor on the leader node that contains the result of the PIVOT query. In this stored procedure the name of the cursor to create is passed to the function as a string.
call pivot_on_all_keys('mycursor');
All that needs to be done at this point is to "fetch" the data from the named cursor. This is done with the FETCH command.
fetch all from mycursor;
I prototyped this on a single node Redshift cluster and "FETCH ALL" is not supported at this configuration so I had to use "FETCH 1000". So if you are also on a single node cluster you will need to use:
fetch 1000 from mycursor;
The last point to note is that the cursor "mycursor" now exists and if you tried to rerun the stored procedure it will fail. You could pass a different name to the procedure (making another cursor) or you could end the transaction (END, COMMIT, or ROLLBACK) or you could close the cursor using CLOSE. Once the cursor is destroyed you can use the same name for a new cursor. If you wanted this to be repeatable you could run this batch of commands:
call pivot_on_all_keys('mycursor'); fetch all from mycursor; close mycursor;
Remember that the cursor has a lifespan of the current transaction so any action that ends the transaction will destroy the cursor. If you have AUTOCOMMIT enable in your bench this will insert COMMITs destroying the cursor (you can run the CALL and FETCH in a batch to prevent this in many benches). Also some commands perform an implicit COMMIT and will also destroy the cursor (like TRUNCATE).
For these reasons, and depending on what else you need to do around the PIVOT query, you may want to have the stored procedure write to a temp table instead of a cursor. Then the temp table can be queried for the results. A temp table has a lifespan of the session so is a little stickier but is a little less efficient as a table needs to be created, the result of the PIVOT query needs to be written to the compute nodes, and then the results have to be sent to the leader node to produce the desired output. Just need to pick the right tool for the job.
===================================
To populate a table within a stored procedure you can just execute the commands. The whole thing will look like:
CREATE OR REPLACE procedure pivot_on_all_keys()
AS
$$
DECLARE
row record;
BEGIN
select into row '\'' || listagg(distinct "key",'\',\'') || '\'' as keys from test;
EXECUTE 'drop table if exists test_stage;';
EXECUTE 'create table test_stage AS select *
from (select UUID, "Key", value from test)
PIVOT (max(value) for "key" in ( ' || row.keys || ' )
);';
END;
$$ LANGUAGE plpgsql;
call pivot_on_all_keys();
select * from test_stage;
If you want this new table to have keys for optimizing downstream queries you will want to create the table in one statement then insert into it but this is quickie path.

A little off-topic, but I wonder why Amazon couldn't introduce a simpler syntax for pivot. IMO, if GROUP BY is replaced by PIVOT BY, it can give enough hint to the interpreter to transform rows into columns. For example:
SELECT partname, avg(price) as avg_price FROM Part GROUP BY partname;
can be written as:
SELECT partname, avg(price) as avg_price FROM Part PIVOT BY partname;
Even multi-level pivoting can also be handled in the same syntax.
SELECT year, partname, avg(price) as avg_price FROM Part PIVOT BY year, partname;

How to excldue null values using REGEXP_SUBSTR

The following statement retrieve the value of sub tag msg_id from MISC column if the sub stag contain value like %PACS%.
SELECT REGEXP_SUBSTR(MISC, '(^|\s|;)msg_id = (.*?)\s*(;|$)',1,1,NULL,2) AS TRANS_REF FROM MISC_HEADER
WHERE MISC LIKE '%PACS%';
I notice the query return record with null value (without msg_id) as well. Any idea if can exclude those null records from the syntax of REGEXP_SUBSTR, without adding any where clause.
Sample data of MISC:
channel=atm ; phone=0123 ; msg_id=PACS00812 ; ustrd=U123
channel=pos; phone=9922; ustrd=U156
The second record without msg_id, so it need to be excluded.

This method does not use REGEXP so may not be suitable for you.
However, it does satisfy your requirement.
This takes your embedded list of msg_id, breaks it out to a row for each component for an ID (I've assumed you do have something uniquely identifies each record).
It then only returns the original row where one of the rows for the ID has 'PACS' in it.
WITH thedata
AS (SELECT 1 AS theid
, 'channel=atm ; phone=0123 ; msg_id=PACS00812 ; ustrd=U123'
AS msg_id
FROM DUAL
UNION ALL
SELECT 2, 'channel=pos; phone=9922; ustrd=U156' FROM DUAL)
, mylist
AS (SELECT theid, COLUMN_VALUE AS msg_component
FROM thedata
, XMLTABLE(('"' || REPLACE(msg_id, ';', '","') || '"')))
SELECT *
FROM thedata td
WHERE EXISTS
(SELECT 1
FROM mylist m
WHERE m.theid = td.theid
AND m.msg_component LIKE '%PACS%')
Thedata sub-query is simply to generate a couple of records and pretend to be your table. You could remove that and substitute your actual table name.
There are other ways to break up an embedded list including ones that use REGEXP, I just find the XMLTABLE method 'cleaner'.

SQLite: SELECT IN by a 100K element list

I have an SQLite table of ~1M rows. Each row has a structure of (docId, docBLOB). Each docBlob is nearly 20Kb.
I have to perform SELECT by an externally provided list of docIDs. Each list may be nearly 100K elements long. How can I do it more efficiently?
Maybe there is a way to make SELECT * IN docBlobTable WHERE docId IN ( [MEGALIST] ) statement?

Put all the IDs into a temporary table, then use:
SELECT * FROM docBlobTable WHERE docId IN (SELECT ID FROM TempTable)
or:
SELECT docBlobTable.*
FROM docBlobTable
JOIN TempTable ON docBlobTable.docId = TempTable.ID

sql Column with multiple values (query implementation in a cpp file )

I am using this link.
I have connected my cpp file with Eclipse to my Database with 3 tables (two simple tables
Person and Item
and a third one PersonItem that connects them). In the third table I use one simple primary and then two foreign keys like that:
CREATE TABLE PersonsItems(PersonsItemsId int not null auto_increment primary key,
Person_Id int not null,
Item_id int not null,
constraint fk_Person_id foreign key (Person_Id) references Person(PersonId),
constraint fk_Item_id foreign key (Item_id) references Items(ItemId));
So, then with embedded sql in c I want a Person to have multiple items.
My code:
mysql_query(connection, \
"INSERT INTO PersonsItems(PersonsItemsId, Person_Id, Item_id) VALUES (1,1,5), (1,1,8);");
printf("%ld PersonsItems Row(s) Updated!\n", (long) mysql_affected_rows(connection));
//SELECT newly inserted record.
mysql_query(connection, \
"SELECT Order_id FROM PersonsItems");
//Resource struct with rows of returned data.
resource = mysql_use_result(connection);
// Fetch multiple results
while((result = mysql_fetch_row(resource))) {
printf("%s %s\n",result[0], result[1]);
}
My result is
-1 PersonsItems Row(s) Updated!
5
but with VALUES (1,1,5), (1,1,8);
I would like that to be
-1 PersonsItems Row(s) Updated!
5 8
Can somone tell me why is this not happening?
Kind regards.

I suspect this is because your first insert is failing with the following error:
Duplicate entry '1' for key 'PRIMARY'
Because you are trying to insert 1 twice into the PersonsItemsId which is the primary key so has to be unique (it is also auto_increment so there is no need to specify a value at all);
This is why rows affected is -1, and why in this line:
printf("%s %s\n",result[0], result[1]);
you are only seeing 5 because the first statement failed after the values (1,1,5) had already been inserted, so there is still one row of data in the table.
I think to get the behaviour you are expecting you need to use the ON DUPLICATE KEY UPDATE syntax:
INSERT INTO PersonsItems(PersonsItemsId, Person_Id, order_id)
VALUES (1,1,5), (1,1,8)
ON DUPLICATE KEY UPDATE Person_id = VALUES(person_Id), Order_ID = VALUES(Order_ID);
Example on SQL Fiddle
Or do not specify the value for personsItemsID and let auto_increment do its thing:
INSERT INTO PersonsItems( Person_Id, order_id)
VALUES (1,5), (1,8);
Example on SQL Fiddle

I think you have a typo or mistake in your two queries.
You are inserting "PersonsItemsId, Person_Id, Item_id"
INSERT INTO PersonsItems(PersonsItemsId, Person_Id, Item_id) VALUES (1,1,5), (1,1,8)
and then your select statement selects "Order_id".
SELECT Order_id FROM PersonsItems
In order to achieve 5, 8 as you request, your second query needs to be:
SELECT Item_id FROM PersonsItems
Edit to add:
Your primary key is autoincrement so you don't need to pass it to your insert statement (in fact it will error as you pass 1 twice).
You only need to insert your other columns:
INSERT INTO PersonsItems(Person_Id, Item_id) VALUES (1,5), (1,8)

Doctrine join query to get all record satisfies count greater than 1

I tried with normal sql query
SELECT activity_shares.id FROM `activity_shares`
INNER JOIN (SELECT `activity_id` FROM `activity_shares`
GROUP BY `activity_id`
HAVING COUNT(`activity_id`) > 1 ) dup ON activity_shares.activity_id = dup.activity_id
Which gives me record id say 10 and 11
But same query I tried to do in Doctrine query builder,
$qb3=$this->getEntityManager()->createQueryBuilder('c')
->add('select','c.id')
->add('from','MyBundleDataBundle:ActivityShare c')
->innerJoin('c.activity', 'ca')
// ->andWhere('ca.id = c.activity')
->groupBy('ca.id')
->having('count(ca.id)>1');
Edited:
$query3=$qb3->getQuery();
$query3->getResult();
Generated SQL is:
SELECT a0_.id AS id0 FROM activity_shares a0_
INNER JOIN activities a1_ ON a0_.activity_id = a1_.id
GROUP BY a1_.id HAVING count(a1_.id) > 1
Gives only 1 record that is 10.I want to get both.I'm not getting idea where I went wrong.Any idea?
My tables structure is:
ActivityShare
+-----+---------+-----+---
| Id |activity |Share| etc...
+-----+---------+-----+----
| 1 | 1 |1 |
+-----+---------+-----+---
| 2 | 1 | 2 |
+-----+---------+-----+---
Activity is foreign key to Activity table.
I want to get Id's 1 and 2

Simplified SQL
first of all let me simplify that query so it gives the same result :
SELECT id FROM `activity_shares`
GROUP BY `id`
HAVING COUNT(`activity_id`) > 1
Docrtrine QueryBuilder
If you store the id of the activty in the table like you sql suggests:
You can use the simplified SQL to build a query:
$results =$this->getEntityManager()->createQueryBuilder('c')
->add('select','c.id')
->add('from','MyBundleDataBundle:ActivityShare c')
->groupBy('c.id')
->having('count(c.activity)>1');
->getResult();
If you are using association tables ( Doctrine logic)
here you will have to use join but the count may be tricky
Solution 1
use the associative table like an entitiy ( as i see it you only need the id)
Let's say the table name is activityshare_activity
it will have two fields activity_id and activityshare_id, if you find a way to add a new column id to that table and make it Autoincrement + Primary the rest is easy :
the new entity being called ActivityShareActivity
$results =$this->getEntityManager()->createQueryBuilder('c')
->add('select','c.activityshare_id')
->add('from','MyBundleDataBundle:ActivityShareActivity c')
->groupBy('c.activityshare_id')
->having('count(c.activity_id)>1');
->getResult();
the steps to add the new identification column to make it compatible with doctrine (you need to do this once):
add the column (INT , NOT NULL) don' t put the autoincrement yet
ALTER TABLE tableName ADD id INT NOT NULL
Populate the column using a php loop like for
Modify the column to be autoincrement
ALTER TABLE tableName MODIFY id INT NOT NULL AUTO_INCREMENT
Solution2
The correction to your query
$result=$this->getEntityManager()->createQueryBuilder()
->select('c.id')
->from('MyBundleDataBundle:ActivityShare', 'c')
->innerJoin('c.activity', 'ca')
->groupBy('c.id') //note: it's c.id not ca.id
->having('count(ca.id)>1')
->getResult();
I posted this one last because i am not 100% sure of the output of having+ count but it should word just fine :)

Thanks for your answers.I finally managed to get answer
My Doctrine query is:
$subquery=$this->getEntityManager()->createQueryBuilder('as')
->add('select','a.id')
->add('from','MyBundleDataBundle:ActivityShare as')
->innerJoin('as.activity', 'a')
->groupBy('a.id')
->having('count(a.id)>1');
$query=$this->getEntityManager()->createQueryBuilder('c')
->add('select','c.id')
->add('from','ChowzterDataBundle:ActivityShare c')
->innerJoin('c.activity', 'ca');
$query->andWhere($query->expr()->in('ca.id', $subquery->getDql()))
;
$result = $query->getQuery();
print_r($result->getResult());
And SQL looks like:
SELECT a0_.id AS id0 FROM activity_shares a0_ INNER JOIN activities a1_ ON a0_.activity_id = a1_.id WHERE a1_.id IN (SELECT a2_.id FROM activity_shares a3_ INNER JOIN activities a2_ ON a3_.activity_id = a2_.id GROUP BY a2_.id HAVING count(a2_.id) > 1

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Remove rows from SQL DB that appear in a Array - c++

DELETE FROM table-name WHERE id in (...)

Related

Redshift Pivot Function

How to excldue null values using REGEXP_SUBSTR

SQLite: SELECT IN by a 100K element list

sql Column with multiple values (query implementation in a cpp file )

Doctrine join query to get all record satisfies count greater than 1

Categories

Resources