What is the difference between volatile table and multiset volatile table? - sas

I am looking at some SAS/Teradata code and confused on the below. This has a volatile table and a multiset volatile table. What is the difference between the two? Also, why does this specify WITH DATA PRIMARY INDEX? Also for the second one, why does this collect statistics?
PROC SQL ;
CONNECT TO TERADATA (AUTHDOMAIN=IDWPRD SERVER= IDWPRD MODE=TERADATA);
EXECUTE(
CREATE VOLATILE TABLE REQ1_1_CODE_INS AS (
SELECT
ACCT_REF_NB,
CAST(NON_MNTR_TXN_PST_TS AS DATE) AS ADJ_DT,
SRC_DATA_DT,
NON_MNTR_TXN_SEQ_NB,
SRC_CRE_USER_ID,
PROC_TRAN_CD,
PROC_TRCK_ID,
MAX(CASE WHEN NON_MNTR_TXN_SBTP_CD = '0009' THEN TRIM(NEW_NON_MNTR_TXN_DTL_TX) ELSE NULL END) AS CARD_NB
FROM DWHMGR.PST_NON_MNTR_TXN
WHERE NON_MNTR_TXN_TP_CD ='255'
AND CAST(NON_MNTR_TXN_PST_TS AS DATE) >= '2016-03-13'
AND CAST(NON_MNTR_TXN_PST_TS AS DATE) <= '2017-11-09'
GROUP BY 1,2,3,4,5,6,7
HAVING TXN_DT <= ADD_MONTHS(ADJ_DT, -24)
OR UPPER(MRCH_NM) LIKE '%CHECK TO%'
OR UPPER(MRCH_NM) LIKE '%BALANCE TRANSFER%'
)WITH DATA PRIMARY INDEX(ACCT_REF_NB) ON COMMIT PRESERVE ROWS;
) BY TERADATA;
CREATE TABLE UNIX.REQ1_1_CODE_INS AS SELECT * FROM CONNECTION TO TERADATA(SELECT * FROM REQ1_1_CODE_INS);
/* REFERENCE TABLE */
EXECUTE(
CREATE MULTISET VOLATILE TABLE _ACCTS_00 AS (
SELECT DISTINCT ACCT_REF_NB FROM REQ1_1_CODE_INS
) WITH DATA PRIMARY INDEX(ACCT_REF_NB) ON COMMIT PRESERVE ROWS;
) BY TERADATA;
EXECUTE( COLLECT STATISTICS ON _ACCTS_00 PRIMARY INDEX(ACCT_REF_NB); ) BY TERADATA;

Volatile table is like work table in SAS, it just is there for particular session.
Teradata has 2 kinds of table, one is set table and another is multiset table. Set table does not allow row level duplicates, where multiset table allows row level duplicates. Default is set table if nothing is mentioned in create table statement.
Teradata also needs a primary index and needs to mentioned as with data primary index(index name). with data gets data another option is with no data
collect stats is big concept, basically it collects demographic data for primary index, which in return helps in future queries dependent on that index.

Related

Using cte to swap two columns of a table

I want to swap 2nd and 3rd column of one table using CTE.
I'm working with below query, which keeps throwing an error,
no such column: cte.comm1
Table - [SalComm] column: ID, Sal, Comm
with CTE as
(
SELECT ID as id1, sal as sal1, comm as comm1 from SalComm
) UPDATE SalComm SET sal=cte.comm1, comm=cte.sal1 where ID= cte.id1*
Could you please suggest to me the right query?
This answer assumes you are using SQL Server, or some other database, which supports directly updating common table expressions. I don't see the point at all of the aliases inside your CTE. If you want to swap columns values, just use the direct columns names:
WITH cte AS (
SELECT ID, sal, comm
FROM SalComm
)
UPDATE cte
SET sal = comm, comm = sal;
-- no WHERE clause needed, if you really want to cover the entire table
That being said, you could just as easily do the above update on the original table. Updatable CTEs are more useful when they generate some complex derived results which you intend to use as part of a later update. That does not appear to be the case here.

Remove duplicates based on sort

I have a customers table with ID's and some datetime columns. But those ID's have duplicates and i just want to Analyse distinct ID values.
I tried using groupby but this makes the process very slow.
Due to data sensitivity can't share it.
Any suggestions would be helpful.
I'd suggest using ROW_NUMBER() This lets you rank the rows by chosen columns and you can then pick out the first result.
Given you've shared no data or table and column names here's an example based on the Adventureworks database. The technique will be the same, you partition by whatever makes the group of rows you want to deduplicate unique (ProductKey below) and order in a way that makes the version you want to keep first (Children, birthdate and customerkey in my example).
USE AdventureWorksDW2017;
WITH CustomersOrdered AS
(
SELECT S.ProductKey, C.CustomerKey, C.TotalChildren, C.BirthDate
, ROW_NUMBER() OVER (
PARTITION BY S.ProductKey
ORDER BY C.TotalChildren DESC, C.BirthDate DESC, C.CustomerKey ASC
) AS CustomerSequence
FROM dbo.FactInternetSales AS S
INNER JOIN dbo.DimCustomer AS C
ON S.CustomerKey = C.CustomerKey
)
SELECT ProductKey, CustomerKey
FROM CustomersOrdered
WHERE CustomerSequence = 1
ORDER BY ProductKey, CustomerKey;
you can also just sort the columns with date column an than click on id column and delete duplicates...

DAX Query Union Multiple Tables & Return Distinct

I have two tables (CompletedJobs & ScriptDetails) and using DAX, I want to return distinct Names that appear in CompletedJobs that do not appear in ScriptDetails.
Here is my SQL Query. Works and return values.
Select Distinct CJ.[Name]
From CompletedJobs CJ
Left Join ScriptDetails SD
ON CJ.[Name]=SD.ActivityName
Where SD.ActivityName IS NULL
I started with creating the following DAX query, but just doing this, I get the following error message:
"A table of multiple values was supplied where a single value was expected"
AdhocJobs = DISTINCT(UNION(SELECTCOLUMNS(CompletedJobs,"Name",CompletedJobs[Name]),SELECTCOLUMNS(ScriptDetails,"Name",ScriptDetails[ActivityName])))
How do I create a DAX query that would replicate the SQL query?
Rather than recreate your SQL, there is DAX that already addresses your specific use case. The EXCEPT function returns a table where rows from the LEFT SIDE table do not appear in the RIGHT SIDE table.
EVALUATE
DISTINCT (
EXCEPT (
SUMMARIZE ( CompletedJobs , CompletedJobs [Name]),
SUMMARIZE ( ScriptDetails , ScriptDetails [ActivityName])
)
)
In this case I used SUMMARIZE to reduce each table down to one column, and then wrapped them with EXCEPT to take only the Names from Completed Jobs that aren't ActivityNames in ScriptDetails.
Hope it helps.

How to update redshift column: simple text replacement

I have a large target table with columns (id, value). I want to update value='old' to value='new'.
The simplest way would be to UPDATE target SET value='new' WHERE value='old';
However, this deletes and creates new rows and is not recommended, possibly. So I tried to do a merge column update:
# staging
CREATE TABLE stage (LIKE target INCLUDING DEFAULTS);
INSERT INTO stage (SELECT id, value FROM target WHERE value=`old`);
UPDATE stage SET value='new' WHERE value='old'; # ??? how do you update value?
# merge
begin transaction;
UPDATE target
SET value = stage.value FROM stage
WHERE target.id = stage.id and target.distkey = stage.distkey; # collocated join?
end transaction;
DROP TABLE stage;
This can't be the best way of creating the table stage: I have to do all these UPDATE delete/writes when I update this way. Is there a way to do it in the INSERT?
Is it necessary to force the collocated join when I use CREATE TABLE LIKE?
Are you updating all the rows in the table?
If yes you can use CTAS (create table as) which is recommended method
Assuming you table looks like this
table1
id, col1,col2, value
You can use the following SQL to create a new table
CREATE TABLE tmp_table AS
SELECT id, col1,col2, 'new_value'
FROM table1;
After you verify data in tmp_table
DROP TABLE table1;
ALTER TABLE tmp_table RENAME TO table1;
If you are not updating all the rows you can use a filter to do a CTAS and insert the rest of the rows to the new table, let me know if you need more info if this is the case
CREATE TABLE tmp_table AS
SELECT id, col1,col2, 'new_value'
FROM table1
WHERE value = 'old'
INSERT INTO tmp_table SELECT * from table1;
Next step would be DROP the tmp table and rename table1
Update: Based on your comment you can do the following, let me know if this solves your case.
This method basically creates a new table to replace your existing table.
I have used some of your code
CREATE TABLE stage (LIKE target INCLUDING DEFAULTS);
INSERT INTO stage SELECT id, 'new' FROM target WHERE value=`old`;
Above INSERT inserts rows to be updated with 'new', no need to run an UPDATE after this.
Bring unchanged rows
INSERT INTO stage SELECT id, value FROM target WHERE value!=`old`;
After this point you have target table which is your original table intact
stage table will have both sets of rows, updated rows with 'new' value and rows you did not want to change
To replace your target with stage
DROP TABLE target;
or to keep it further verification
ALTER TABLE target RENAME TO target_old;
ALTER TABLE stage RENAME TO target;
From a redshift developer:
This case doesn't require an upsert, or update+insert, and it is fine to just run the update:
UPDATE target SET value='new' WHERE value='old';
Another way would be to INSERT the rows you need and DELETE the other rows, but that's unnecessarily complicated.

SQLite: SELECT IN by a 100K element list

I have an SQLite table of ~1M rows. Each row has a structure of (docId, docBLOB). Each docBlob is nearly 20Kb.
I have to perform SELECT by an externally provided list of docIDs. Each list may be nearly 100K elements long. How can I do it more efficiently?
Maybe there is a way to make SELECT * IN docBlobTable WHERE docId IN ( [MEGALIST] ) statement?
Put all the IDs into a temporary table, then use:
SELECT * FROM docBlobTable WHERE docId IN (SELECT ID FROM TempTable)
or:
SELECT docBlobTable.*
FROM docBlobTable
JOIN TempTable ON docBlobTable.docId = TempTable.ID