Update duplicate values of the column of a table in oracle - oracle19c

I have a table in which there are duplicate records.
I want this to be like below
Please help me on this.

I am not sure what you are trying to achieve, but I think that you might use a "window SQL" function. This allows you to group rows, for instance to show duplicates in the right order (and potentially eliminate or update them. For instance take a look at the following example where I show the duplicates of the first two dates in your data set ordered by the different values for the third date. I then de-duplicate the data set, keeping only the first value for date3 for each pair of date1, date2.
SQL> connect scott/tiger#tiger
Connected.
SQL> set echo on
SQL> alter session set nls_date_format='DD-MON-RR';
Session altered.
SQL> drop table test;
Table TEST dropped.
SQL> create table test (
2 col1 number,
3 d1 date,
4 d2 date,
5 d3 date);
Table TEST created.
SQL> insert into test values (1,'01-mar-21','03-sep-21','21-oct-21');
1 row inserted.
SQL> insert into test values (1,'01-mar-21','21-oct-21','21-oct-21');
1 row inserted.
SQL> insert into test values (1,'01-mar-21','21-oct-21','21-oct-21');
1 row inserted.
SQL> insert into test values (1,'01-mar-21','21-oct-21','22-oct-21');
1 row inserted.
SQL> insert into test values (1,'01-mar-21','22-oct-21','22-oct-21');
1 row inserted.
SQL> commit;
Commit complete.
SQL> -- show duplicates on d1, d2 order by d3
SQL> select row_number()
2 over (partition by d1, d2 order by d3) rn, t.d1, t.d2, t.d3
3 from test t;
RN D1 D2 D3
_____ ____________ ____________ ____________
1 01-MAR-21 03-SEP-21 21-OCT-21
1 01-MAR-21 21-OCT-21 21-OCT-21
2 01-MAR-21 21-OCT-21 21-OCT-21
3 01-MAR-21 21-OCT-21 22-OCT-21
1 01-MAR-21 22-OCT-21 22-OCT-21
Notice that rn is incremented for each d3 within a given pair of d1,d2
SQL> -- eliminate the duplicates on d1, d2 (keep first one)
SQL> select x.* from (select row_number() over (partition by d1, d2 order by d3) rn, t.d1, t.d2, t.d3
2 from test t )x where x.rn=1;
RN D1 D2 D3
_____ ____________ ____________ ____________
1 01-MAR-21 03-SEP-21 21-OCT-21
1 01-MAR-21 21-OCT-21 21-OCT-21
1 01-MAR-21 22-OCT-21 22-OCT-21

Related

why does not my query work in oracle apex?

This is my query but when I run this in oracle apex it gives me the following error:
delete from
(select ename,e1.store_id,e1.sal as highest_sal
from
employees e1 inner join
(select store_id,max(sal) as sal
from employees
group by store_id
) e2
on e1.store_id=e2.store_id
and e1.sal=e2.sal
order by store_id) s
where rowid not in
(select min(rowid) from s
group by highest_sal);
The output is:
ORA-00942: table or view does not exist
ORA-06512: at "SYS.WWV_DBMS_SQL_APEX_210200", line 673
ORA-06512: at "SYS.DBMS_SYS_SQL", line 1658
ORA-06512: at "SYS.WWV_DBMS_SQL_APEX_210200", line 659
ORA-06512: at "APEX_210200.WWV_FLOW_DYNAMIC_EXEC", line 1829
4. (select store_id,max(sal) as sal
5. from employees
6. group by store_id
7. ) e2
8. on e1.store_id=e2.store_id
When I run the code in parentheses, which has the alias s alone, it runs without any problems, but when it is placed in this code, it gives an error
updated: My goal is to first group the data according to store_id and get the maximum sal in each, and join it to the main table itself where sal and store_id are the same, and display its name, which The resulting table is called s. Then I want to remove the duplicate rows from the table (which have the same sal) and to do this we group according to highest_sal and select the least rowid between them, and remove those rowId that are not in the subquery. As a result, non-duplicates are obtained. (This is a trick to remove duplicate lines.)
You appear to want to delete all rows with the highest sal for each store_id grouping except for the row in each group with the lowest ROWID.
You can do that with analytic functions. Either:
DELETE FROM employees
WHERE ROWID IN (
SELECT ROWID
FROM (
SELECT RANK() OVER (PARTITION BY store_id ORDER BY sal DESC) AS rnk,
ROW_NUMBER() OVER (PARTITION BY store_id ORDER BY sal DESC, ROWID ASC)
AS rn
FROM employees
)
WHERE rnk = 1
AND rn > 1
);
or:
DELETE FROM employees
WHERE ROWID IN (
SELECT ROWID
FROM (
SELECT sal,
MAX(sal) OVER (PARTITION BY store_id) AS max_sal,
MIN(ROWID) KEEP (DENSE_RANK LAST ORDER BY sal)
OVER (PARTITION BY store_id) AS min_rid_for_max_sal
FROM employees
)
WHERE sal = max_sal
AND ROWID != min_rid_for_max_sal
);
Or, from Oracle 12, with row limiting clauses in a correlated sub-query:
DELETE FROM employees e
WHERE ROWID IN (
SELECT ROWID
FROM (
SELECT sal
FROM employees x
WHERE e.store_id = x.store_id
ORDER BY sal DESC
FETCH FIRST ROW WITH TIES
)
ORDER BY ROWID
OFFSET 1 ROW FETCH NEXT 100 PERCENT ROWS ONLY
);
Which, for the sample data:
CREATE TABLE employees (ename, store_id, sal) AS
SELECT 'A', 1, 1 FROM DUAL UNION ALL
SELECT 'B', 1, 2 FROM DUAL UNION ALL
SELECT 'C', 1, 3 FROM DUAL UNION ALL
SELECT 'D', 2, 1 FROM DUAL UNION ALL
SELECT 'E', 2, 2 FROM DUAL UNION ALL
SELECT 'F', 2, 2 FROM DUAL;
All delete the f row.
db<>fiddle here

Power BI pivot column with no aggregation errors

I have a simple table that I want to pivot by the 'COLUMN_NAME' column:
When I pivot and aggregate by count it works fine:
When I try to pivot without aggregation, it gives this error:
Expression.Error: There were too many elements in the enumeration to
complete the operation. Details:
[List]
Here is what I expected to happen:
thx in adavnce
You need to pivot against a specific column, otherwise the powerbi engine can't determine how to keep the data in rows consistently.
Your input needs to be in a format similar to this:
RecordID
COLUMN_NAME
COLUMN_VALUE
1
PRODUCT_SUB_FAMILY
MYPRODUCT
1
MFG_STEP_NAME
FT1
1
QTY_IN
678
1
QTY_OUT
480
1
AGG_YIELD
0.70796
2
PRODUCT_SUB_FAMILY
MYPRODUCT
2
MFG_STEP_NAME
SLT1
2
QTY_IN
66
2
QTY_OUT
0
2
AGG_YIELD
0
And then when you pivot, you select the RecordID as the column you pivot against.

Proc sql - Group by aggregate function from subquery in main query

I two data sets containing millions of rows. Table1 contains two different ID numbers, ID1 and ID2. It also contains a variable explaining which group (variable y1) a certain ID belongs to.
The second table (Table2) contains two variables from the first table and an additional one.
I want to join the two tables together but before the join, I want table1 to only contain information grouped by ID1 and also for it to give me information which group an ID belongs to.
I could do this in two Proc Sql stages where I first create a table on table1 where I group by ID1 and then create another step where I merge it onto table2. However this is rather inefficient as my tables contain so many rows and I would therefore like to do it in one run. Hence I have instead created a subquery that does what I want. My problem is that I get the error that I can't group by the variable "WhichGroup" from my subquery as it stems from an aggregate function. I'm wondering if there is some good workaround to what I want to achieve?
Many thanks in advance!
Example code:
data table1;
input ID1 $ ID2 $ x1 2. y1 $;
datalines;
1 p1 10 Group1
1 p2 20 Group2
2 p3 50 Group1
;
run;
data table2;
input ID1 $ x1 x2;
datalines;
1 10 500
1 20 600
2 50 700
;
run;
Proc sql;
Create table Test
as select
t1.WhichGroup
,sum(t1.Sum_x1) as Sum_x1
,sum(t2.x2) as Sum_x2
from (select
a.ID1
,case when max(case when a.y1 = 'Group1' then 1 else 0 end) = 0 then 'Group2'
when max(case when a.y1 = 'Group2' then 1 else 0 end) = 0 then 'Group1'
else 'Both' end as WhichGroup
,Sum(a.x1) as Sum_x1
from work.table1 as a
group by 1
) as t1
left join
work.table2 as t2
on t1.ID1 = t2.ID1
Group by 1;
Quit;
- Answering my own question -
I am not sure why this is happening but I have encountered a very interest phenomenon and potentially a bug in SAS.
It appears that the whole reason the query doesn't work is because SAS does not understand the group by statement if it is given in digits rather than explicitly stating the variable name you want to group by. Potentially SAS gets lost in the column order?
Has anyone else encountered such a phenomenon before in SAS?
Hence the query works if the following code is used:
Proc sql;
Create table Test
as select
t1.WhichGroup
,sum(t1.Sum_x1) as sum_x1
,sum(t2.x2) as Sum_x2
from (select
a.ID1
,case when max(case when a.y1 = 'Group1' then 1 else 0 end) = 0 then 'Group2'
when max(case when a.y1 = 'Group2' then 1 else 0 end) = 0 then 'Group1'
else 'Both' end as WhichGroup
,Sum(a.x1) as Sum_x1
from work.table1 as a
group by 1
) as t1
left join
work.table2 as t2
on t1.ID1 = t2.ID1
Group by WhichGroup;
Quit;

Redshift. Convert comma delimited values into rows

I am wondering how to convert comma-delimited values into rows in Redshift. I am afraid that my own solution isn't optimal. Please advise. I have table with one of the columns with coma-separated values. For example:
I have:
user_id|user_name|user_action
-----------------------------
1 | Shone | start,stop,cancell...
I would like to see
user_id|user_name|parsed_action
-------------------------------
1 | Shone | start
1 | Shone | stop
1 | Shone | cancell
....
A slight improvement over the existing answer is to use a second "numbers" table that enumerates all of the possible list lengths and then use a cross join to make the query more compact.
Redshift does not have a straightforward method for creating a numbers table that I am aware of, but we can use a bit of a hack from https://www.periscope.io/blog/generate-series-in-redshift-and-mysql.html to create one using row numbers.
Specifically, if we assume the number of rows in cmd_logs is larger than the maximum number of commas in the user_action column, we can create a numbers table by counting rows. To start, let's assume there are at most 99 commas in the user_action column:
select
(row_number() over (order by true))::int as n
into numbers
from cmd_logs
limit 100;
If we want to get fancy, we can compute the number of commas from the cmd_logs table to create a more precise set of rows in numbers:
select
n::int
into numbers
from
(select
row_number() over (order by true) as n
from cmd_logs)
cross join
(select
max(regexp_count(user_action, '[,]')) as max_num
from cmd_logs)
where
n <= max_num + 1;
Once there is a numbers table, we can do:
select
user_id,
user_name,
split_part(user_action,',',n) as parsed_action
from
cmd_logs
cross join
numbers
where
split_part(user_action,',',n) is not null
and split_part(user_action,',',n) != '';
Another idea is to transform your CSV string into JSON first, followed by JSON extract, along the following lines:
... '["' || replace( user_action, '.', '", "' ) || '"]' AS replaced
... JSON_EXTRACT_ARRAY_ELEMENT_TEXT(replaced, numbers.i) AS parsed_action
Where "numbers" is the table from the first answer. The advantage of this approach is the ability to use built-in JSON functionality.
If you know that there are not many actions in your user_action column, you use recursive sub-querying with union all and therefore avoiding the aux numbers table.
But it requires you to know the number of actions for each user, either adjust initial table or make a view or a temporary table for it.
Data preparation
Assuming you have something like this as a table:
create temporary table actions
(
user_id varchar,
user_name varchar,
user_action varchar
);
I'll insert some values in it:
insert into actions
values (1, 'Shone', 'start,stop,cancel'),
(2, 'Gregory', 'find,diagnose,taunt'),
(3, 'Robot', 'kill,destroy');
Here's an additional table with temporary count
create temporary table actions_with_counts
(
id varchar,
name varchar,
num_actions integer,
actions varchar
);
insert into actions_with_counts (
select user_id,
user_name,
regexp_count(user_action, ',') + 1 as num_actions,
user_action
from actions
);
This would be our "input table" and it looks just as you expected
select * from actions_with_counts;
id
name
num_actions
actions
2
Gregory
3
find,diagnose,taunt
3
Robot
2
kill,destroy
1
Shone
3
start,stop,cancel
Again, you can adjust initial table and therefore skipping adding counts as a separate table.
Sub-query to flatten the actions
Here's the unnesting query:
with recursive tmp (user_id, user_name, idx, user_action) as
(
select id,
name,
1 as idx,
split_part(actions, ',', 1) as user_action
from actions_with_counts
union all
select user_id,
user_name,
idx + 1 as idx,
split_part(actions, ',', idx + 1)
from actions_with_counts
join tmp on actions_with_counts.id = tmp.user_id
where idx < num_actions
)
select user_id, user_name, user_action as parsed_action
from tmp
order by user_id;
This will create a new row for each action, and the output would look like this:
user_id
user_name
parsed_action
1
Shone
start
1
Shone
stop
1
Shone
cancel
2
Gregory
find
2
Gregory
diagnose
2
Gregory
taunt
3
Robot
kill
3
Robot
destroy
Here are two ways to achieve this.
In my example, I'm assuming that I am accepting a comma separated list of values. My values look like schema.table.column.
The first involves using a recursive CTE.
drop table if exists #dep_tbl;
create table #dep_tbl as
select 'schema.foobar.insert_ts,schema.baz.load_ts' as dep
;
with recursive tmp (level, dep_split, to_split) as
(
select 1 as level
, split_part(dep, ',', 1) as dep_split
, regexp_count(dep, ',') as to_split
from #dep_tbl
union all
select tmp.level + 1 as level
, split_part(a.dep, ',', tmp.level + 1) as dep_split_u
, tmp.to_split
from #dep_tbl a
inner join tmp on tmp.dep_split is not null
and tmp.level <= tmp.to_split
)
select dep_split from tmp;
the above yields:
|dep_split|
|schema.foobar.insert_ts|
|schema.baz.load_ts|
The second involves a stored procedure.
CREATE OR REPLACE PROCEDURE so_test(dependencies_csv varchar(max))
LANGUAGE plpgsql
AS $$
DECLARE
dependencies_csv_vals varchar(max);
BEGIN
drop table if exists #dep_holder;
create table #dep_holder
(
avoid varchar(60000)
);
IF dependencies_csv is not null THEN
dependencies_csv_vals:='('||replace(quote_literal(regexp_replace(dependencies_csv,'\\s','')),',', '\'),(\'') ||')';
execute 'insert into #dep_holder values '||dependencies_csv_vals||';';
END IF;
END;
$$
;
call so_test('schema.foobar.insert_ts,schema.baz.load_ts')
select
*
from
#dep_holder;
the above yields:
|dep_split|
|schema.foobar.insert_ts|
|schema.baz.load_ts|
in conclusion
If you only care about one single column in your input (the X delimited values), then I think the stored procedure is easier/faster.
However, if you have other columns you care about and want to keep those columns along with your comma separated value column now transformed to rows, OR, if you want to know the argument (original list of delimited values), I think the stored procedure is the way to go. In that case, you can just add those other columns to your columns selected in the recursive query.
You can get the expected result with the following query. I'm using "UNION ALL" to convert a column to row.
select user_id, user_name, split_part(user_action,',',1) as parsed_action from cmd_logs
union all
select user_id, user_name, split_part(user_action,',',2) as parsed_action from cmd_logs
union all
select user_id, user_name, split_part(user_action,',',3) as parsed_action from cmd_logs
Here's my equally-terrible answer.
I have a users table, and then an events table with a column that is just a comma-delimited string of users at said event. eg
event_id | user_ids
1 | 5,18,25,99,105
In this case, I used the LIKE and wildcard functions to build a new table that represents each event-user edge.
SELECT e.event_id, u.id as user_id
FROM events e
LEFT JOIN users u ON e.user_ids like '%' || u.id || '%'
It's not pretty, but I throw it in a WITH clause so that I don't have to run it more than once per query. I'll likely just build an ETL to create that table every night anyway.
Also, this only works if you have a second table that does have one row per unique possibility. If not, you could do LISTAGG to get a single cell with all your values, export that to a CSV and reupload that as a table to help.
Like I said: a terrible, no-good solution.
Late to the party but I got something working (albeit very slow though)
with nums as (select n::int n
from
(select
row_number() over (order by true) as n
from table_with_enough_rows_to_cover_range)
cross join
(select
max(json_array_length(json_column)) as max_num
from table_with_json_column )
where
n <= max_num + 1)
select *, json_extract_array_element_text(json_column,nums.n-1) parsed_json
from nums, table_with_json_column
where json_extract_array_element_text(json_column,nums.n-1) != ''
and nums.n <= json_array_length(json_column)
Thanks to answer by Bob Baxley for inspiration
Just improvement for the answer above https://stackoverflow.com/a/31998832/1265306
Is generating numbers table using the following SQL
https://discourse.looker.com/t/generating-a-numbers-table-in-mysql-and-redshift/482
SELECT
p0.n
+ p1.n*2
+ p2.n * POWER(2,2)
+ p3.n * POWER(2,3)
+ p4.n * POWER(2,4)
+ p5.n * POWER(2,5)
+ p6.n * POWER(2,6)
+ p7.n * POWER(2,7)
as number
INTO numbers
FROM
(SELECT 0 as n UNION SELECT 1) p0,
(SELECT 0 as n UNION SELECT 1) p1,
(SELECT 0 as n UNION SELECT 1) p2,
(SELECT 0 as n UNION SELECT 1) p3,
(SELECT 0 as n UNION SELECT 1) p4,
(SELECT 0 as n UNION SELECT 1) p5,
(SELECT 0 as n UNION SELECT 1) p6,
(SELECT 0 as n UNION SELECT 1) p7
ORDER BY 1
LIMIT 100
"ORDER BY" is there only in case you want paste it without the INTO clause and see the results
create a stored procedure that will parse string dynamically and populatetemp table, select from temp table.
here is the magic code:-
CREATE OR REPLACE PROCEDURE public.sp_string_split( "string" character varying )
AS $$
DECLARE
cnt INTEGER := 1;
no_of_parts INTEGER := (select REGEXP_COUNT ( string , ',' ));
sql VARCHAR(MAX) := '';
item character varying := '';
BEGIN
-- Create table
sql := 'CREATE TEMPORARY TABLE IF NOT EXISTS split_table (part VARCHAR(255)) ';
RAISE NOTICE 'executing sql %', sql ;
EXECUTE sql;
<<simple_loop_exit_continue>>
LOOP
item = (select split_part("string",',',cnt));
RAISE NOTICE 'item %', item ;
sql := 'INSERT INTO split_table SELECT '''||item||''' ';
EXECUTE sql;
cnt = cnt + 1;
EXIT simple_loop_exit_continue WHEN (cnt >= no_of_parts + 2);
END LOOP;
END ;
$$ LANGUAGE plpgsql;
Usage example:-
call public.sp_string_split('john,smith,jones');
select *
from split_table
You can try copy command to copy your file into redshift tables
copy table_name from 's3://mybucket/myfolder/my.csv' CREDENTIALS 'aws_access_key_id=my_aws_acc_key;aws_secret_access_key=my_aws_sec_key' delimiter ','
You can use delimiter ',' option.
For more details of copy command options you can visit this page
http://docs.aws.amazon.com/redshift/latest/dg/r_COPY.html

Merging data sets in sas

Suppose I have a dataset A:
ID Geogkey
1 A
1 B
1 C
2 W
2 R
2 S
and another dataset B:
ID Temp Date
1 95 1
1 100 2
1 105 3
2 10 1
How do I merge these two datasets so I get three records each for geogkeys with id=1 and one record each for geogkeys where id =2?
Assuming you want the cartesian join, you are best off doing that in SQL, if it's not too big:
proc sql;
create table C as
select * from A,B
where A.ID=B.ID
;
quit;
The select * will generate a warning that the ID variables are overwriting; if that's a concern, explicitly spell out your select (select A.ID, A.Geogkey, B.Temp, B.date).