Can you update a table using a partition within the where clause? - sql-update

I have a query:
SELECT *, ORDER = MAX(ORDER) OVER (PARTITION BY ID)
FROM MY TABLE
Where MY_TABLE is currently
ID
ORDER
AGE
RECENT
12
34
50
TRUE
99
41
17
TRUE
12
34
24
TRUE
99
42
12
TRUE
12
33
15
TRUE
12
33
38
TRUE
I want the table to be updated as the result from the query to be
ID
ORDER
AGE
RECENT
12
34
50
TRUE
99
41
17
FALSE
12
34
24
TRUE
99
42
12
TRUE
12
33
15
FALSE
12
33
38
FALSE
Is there a way to do this with an UPDATE statement.
I was trying
UPDATE MY_TABLE
SET RECENT = FALSE
WHERE ORDER <> MAX(ORDER) OVER (PARTITION BY ID);
But I am not sure how to incorporate the partition statement into the update.

Try this:
update my_table
set recent = false
where (id, order) not in (select id, max(order) from my_table group by 1);

As suggested by #Tim Biegeleisen, running this as an update may not be the best idea. Rather than updating the table you could create a view over the top of it to always show the latest information, without needing to update the table. This removes the risk of the RECENT column going out of date between new values being added, and the update process being executed.
CREATE TABLE my_tbl (id NUMBER, "ORDER" NUMBER, age NUMBER);
INSERT INTO my_tbl VALUES
(12, 34, 50),
(99, 41, 17),
(12, 34, 24),
(99, 42, 12),
(12, 33, 15),
(12, 33, 38);
CREATE OR REPLACE VIEW my_view AS
SELECT
*,
"ORDER" = MAX("ORDER") OVER (PARTITION BY ID) AS RECENT
FROM
my_tbl
;
This gives the output:
ID
ORDER
AGE
RECENT
12
34
50
TRUE
99
41
17
FALSE
12
34
24
TRUE
99
42
12
TRUE
12
33
15
FALSE
12
33
38
FALSE
Adding another row:
INSERT INTO my_tbl VALUES (12, 35, 51);
Gives:
ID
ORDER
AGE
RECENT
12
34
50
FALSE
99
41
17
FALSE
12
34
24
FALSE
99
42
12
TRUE
12
33
15
FALSE
12
33
38
FALSE
12
35
51
TRUE

Related

Sqoop --boundary-query

I have data set as below. Can someone help me to import data to hdfs using sqoop boundary query, Using the column (id) which is having duplicate keys.
mysql> select id,name,age from employee;
id name age
1 A 30
2 B 35
3 C 40
4 D 23
5 E 26
1 A 24
2 B 16
3 C 78
4 G 66
3 H 56
4 A 63
20 C 58
13 F 47
2 A 49
3 B 60

plotting a vertical line exactly in the middle and at specific value of X in SAS

For the below x, y data, how to plot vertical line exactly in the middle of the plot (please note, X should not be ordered and plot be as it is).
Also how to plot a vertical line at x=5 (a distance on X from X=0) when X in the data below is taken as 0, 1, 2, 3.. and so forth.
data sample;
infile cards truncover expandtabs;
input X Y;
cards;
29 21
18 23
28 24
16 26
3 27
18 29
2 33
3 37
26 39
2 42
25 47
9 54
13 57
17 58
29 60
5 63
23 66
4 69
3 72
17 73
7 73
12 72
8 69
20 66
12 63
8 60
28 58
3 57
18 54
11 47
21 42
8 39
1 37
16 29
3 27
17 22
3 19
6 17
19 14
18 10
;
run;
I tried:
proc sort data=sample ;
by x;
run;
proc sgplot data=sample;
needle x=x y=y;
run;
data Trapezoidal;
set sample end=last;
dif_x=dif(x);
mean_y=mean(lag(y),y);
integral + (dif_x*mean_y);
if last then putlog 'area under curve is ' integral;
run;
Vertical lines are plotted with refline. Determine what the 'middle' is however you wish (using proc means or proc sql or similar), get it into a variable in your dataset or a macro variable, and use refline in proc sgplot to produce the line.
Same applies for your specific-values-of-x (except you don't actually need to do anything there to produce the value to plot at). Add refline x=5; or similar to your plots to get them.
You could also use band plots if you're trying to highlight certain areas.

Creating statistical data from a table

I have a table with 20 columns of measurements. I would like 'convert' the table into a table with 20 rows with columns of Avg, Min, Max, StdDev, Count types of information. There is another question like this but it was for the 'R' language. Other question here.
I could do the following for each column (processing the results with C++):
Select Count(Case When [avgZ_l1] <= 0.15 and avgZ_l1 > 0 then 1 end) as countValue1,
Count(case when [avgZ_l1] <= 0.16 and avgZ_l1 > 0.15 then 1 end) as countValue2,
Count(case when [avgZ_l1] <= 0.18 and avgZ_l1 > 0.16 then 1 end) as countValue3,
Count(case when [avgZ_l1] <= 0.28 and avgZ_l1 > 0.18 then 1 end) as countValue4,
Avg(avgwall_l1) as avg1, Min(avgwall_l1) as min1, Max(avgZ_l1) as max1,
STDEV(avgZ_l1) as stddev1, count(*) as totalCount from myProject.dbo.table1
But I do not want to process the 50,000 records 20 times (once for each column). I thought there would be away to 'pivot' the table onto its side and process the data at the same time. I have seen examples of the 'Pivot' but they all seem to pivot on a integer type field, Month number or Device Id. Once the table is converted I could then fetch each row with C++. Maybe this is really just 'Insert into ... select ... from' statements.
Would the fastest (execution time) approach be to simply create a really long select statement that returns all the information I want for all the columns?
We might end up with 500,000 rows. I am using C++ and SQL 2014.
Any thoughts or comments are welcome. I just don't want have my naive code to be used as a shining example of how NOT to do something... ;)...
If your table looks the same as the code that you sent in r then the following query should work for you. It selects the data that you requested and pivots it at the same time.
create table #temp(ID int identity(1,1),columnName nvarchar(50));
insert into #temp
SELECT COLUMN_NAME as columnName
FROM myProject.INFORMATION_SCHEMA.COLUMNS -- change myProject to the name of your database. Unless myProject is your database
WHERE TABLE_NAME = N'table1'; --change table1 to your table that your looking at. Unless table1 is your table
declare #TableName nvarchar(50) = 'table1'; --change table1 to your table again
declare #loop int = 1;
declare #query nvarchar(max) = '';
declare #columnName nvarchar(50);
declare #endQuery nvarchar(max)='';
while (#loop <= (select count(*) from #temp))
begin
set #columnName = (select columnName from #temp where ID = #loop);
set #query = 'select t.columnName, avg(['+#columnName+']) as Avg ,min(['+#columnName+']) as min ,max(['+#columnName+'])as max ,stdev(['+#columnName+']) as STDEV,count(*) as totalCount from '+#tablename+' join #temp t on t.columnName = '''+#columnName+''' group by t.columnName';
set #loop += 1;
set #endQuery += 'union all('+ #query + ')';
end;
set #endQuery = stuff(#endQuery,1,9,'')
Execute(#endQuery);
drop table #temp;
It creates a #temp table which stores the values of your column headings next to an ID. It then uses the ID when looping though the number of columns that you have. It then generates a query which selects what you want and then unions it together. This query will work on any number of columns meaning that if you add or remove more columns it should give the correct result.
With this input:
age height_seca1 height_chad1 height_DL weight_alog1
1 19 1800 1797 180 70
2 19 1682 1670 167 69
3 21 1765 1765 178 80
4 21 1829 1833 181 74
5 21 1706 1705 170 103
6 18 1607 1606 160 76
7 19 1578 1576 156 50
8 19 1577 1575 156 61
9 21 1666 1665 166 52
10 17 1710 1716 172 65
11 28 1616 1619 161 66
12 22 1648 1644 165 58
13 19 1569 1570 155 55
14 19 1779 1777 177 55
15 18 1773 1772 179 70
16 18 1816 1809 181 81
17 19 1766 1765 178 77
18 19 1745 1741 174 76
19 18 1716 1714 170 71
20 21 1785 1783 179 64
21 19 1850 1854 185 71
22 31 1875 1880 188 95
23 26 1877 1877 186 106
24 19 1836 1837 185 100
25 18 1825 1823 182 85
26 19 1755 1754 174 79
27 26 1658 1658 165 69
28 20 1816 1818 183 84
29 18 1755 1755 175 67
It will produce this output:
avg min max stdev totalcount
age 20 17 31 3.3 29
height_seca1 1737 1569 1877 91.9 29
height_chad1 1736 1570 1880 92.7 29
height_DL 173 155 188 9.7 29
weight_alog1 73 50 106 14.5 29
Hope this helps and works for you. :)

select specific rows by row number in sas

I am new to SAS I have SAS data like (It does not contain Obs column)
Obs ID Name Score1 Score2 Score3
1 101 90 95 98
2 203 78 77 75
3 223 88 67 75
4 280 68 87 75
.
.
.
.
100 468 78 77 75
I want data having row number 2 6 8 10 34. Output should look like
Obs ID Name Score1 Score2 Score3
1 203 78 77 75
2 227 88 67 75
3 280 68 87 75
.
.
.
Thanks in advance.
The other answer is ok for small tables, but if you are working with a very large table it's inefficient as it reads every row in the table to see whether it has the right row number. Here's a more direct approach:
data example;
do i = 2, 6, 8, 10;
set sashelp.class point = i;
output;
end;
stop;
run;
This picks out just the rows you actually want and doesn't read all the others.
You can loop through each line of data with a data step and only output the lines when you are in the n'th loop with a condition like this.
data test;
set LIB.TABLE;
if _N_ in (2, 6, 8, 10, 34) then output;
run;
where _N_ will correspond to the number of the line in this case.

c source code to remove subset transactions from text file

I have a file containing data as follows
10 20 30 40 70
20 30 70
30 40 10 20
29 70
80 90 20 30 40
40 45 65 10 20 80
45 65 20
I want to remove all subset transaction from this file.
output file should be like follows
10 20 30 40 70
29 70
80 90 20 30 40
40 45 65 10 20 80
Where records like
20 30 70
30 40 10 20
45 65 20
are removed because of they are subset of other records.
the algorithm could be like this:
sets = list()
f = open("data.txt")
for line in f:
currentSet = set()
for item in line.split():
currentSet.add(int(item))
printIt = True
for s in sets:
if currentSet.issubset(s):
printIt = False
break
if printIt:
print line,
sets.append(currentSet)
Incidentally, this is also a Python program :) Also I believe that an algorithm with better effeciency could be made.
Your next step: rewrite this to C/C++. Good luck :)