I have two tables emp and dept and I want to update the salary in emp table to increase by 10000 when the department name is "Software Engineer" - sql-update

I have two tables emp and dept and I want to update the salary in emp table to increase by 10000 when the department name is "Software Engineer".emp table does not have dep name.
I have tried this query :
update emp
set salary = salary + 10000
where exists (select d.depatment_name, e.salary
from emp e
join department d on e.dep_id = d.department_id
where dep_name = 'Software Engineer');
select * from emp;
But its updating the salary for all rows.

I think the join is not working as expected. try selecting the department first then left joining employees. also, dept_name needed the "d." table alias.
update emp
set salary = salary + 10000
where exists (select d.depatment_name, e.salary
from department d
left join emp e
on e.dep_id = d.department_id
where d.dep_name = 'Software Engineer');
select * from emp;

Related

BigQuery compare all the columns(100+) from two rows in a sinle table

I have input table as below-
id
col1
col2
time
01
abc
001
12:00
01
def
002
12:10
Required output table-
id
col1
col2
time
diff_field
01
abc
001
12:00
null
01
def
002
12:10
col1,col2
I need to compare both the rows and find all the columns for which there is difference in value and keep those column names in a new column diff_field.
I need a optimized solution for this as my table has more than 100 columns(all the columns need to be compared)
You might consider below approach:
WITH sample_table AS (
SELECT '01' id, 'abc' col1, '001' col2, '12:00' time UNION ALL
SELECT '01' id, 'def' col1, '002' col2, '12:10' time UNION ALL
SELECT '01' id, 'def' col1, '002' col2, '12:20' time UNION ALL
SELECT '01' id, 'ddf' col1, '002' col2, '12:30' time
)
SELECT * EXCEPT(curr, prev),
(SELECT STRING_AGG('col' || offset)
FROM UNNEST(SPLIT(curr)) c WITH offset
JOIN UNNEST(SPLIT(prev)) p WITH offset USING (offset)
WHERE c <> p AND offset < ARRAY_LENGTH(SPLIT(curr)) - 1
) diff_field
FROM (
SELECT *, FORMAT('%t', t) AS curr, LAG(FORMAT('%t', t)) OVER w AS prev
FROM sample_table t
WINDOW w AS (PARTITION BY id ORDER BY time)
);
Query results
Below approach has no dependency on actual columns' names or any names convention rather then only id and time
create temp function extract_keys(input string) returns array<string> language js as """
return Object.keys(JSON.parse(input));
""";
create temp function extract_values(input string) returns array<string> language js as """
return Object.values(JSON.parse(input));
""";
select t.*,
( select string_agg(col)
from unnest(extract_keys(cur)) as col with offset
join unnest(extract_values(cur)) as cur_val with offset using(offset)
join unnest(extract_values(prev)) as prev_val with offset using(offset)
where cur_val != prev_val and col != 'time'
) as diff_field
from (
select t, to_json_string(t) cur, to_json_string(ifnull(lag(t) over(win), t)) prev
from your_table t
window win as (partition by id order by time)
)
if apply to sample data in your question (or rather extended version of it that I borrowed from Jaytiger answer) - the output is

Choosing the row with the maximum character length in sas

I have the following dataset:
dataseta:
No. Name1 Name2 Sales Inv Comp
1 TC Tribal Council Inc 100 100 0
2. TC Tribal Council Limited INC 20 25 65
desired output:
datasetb:
No. Name1 Name2 Sales Inv Comp
1 TC Tribal Council Limited Inc 120 125 0
Basically, I need to choose the row with the maximum length of characters for the column name2.
I tried the following, but it didn't work
proc sql;
create table datasetb as select no,name1,name2,sum(sales),sum(inv),min(comp) from dataseta group by 1,2,3 having length(name2)=max(length(name2));quit;
If I do the following code, it only partially resolves it, and I get duplicate rows
proc sql;
create table datasetb as select no,name1,max(length(name2)),sum(sales),sum(inv),min(comp) from dataseta group by 1,2 having length(name2)=max(length(name2));quit;
You appear to be joining the results of two separate aggregate computations.
Presuming:
no is unique so as to allow a tie breaker criteria and the first (per no) longest name2 is to be joined with the cost, inv, comp totals over name1.
The query will have lots going on...
1st longest name2 within name1, nested subqueries are needed to:
Determine the longest name2, then
Select first one, according to no, if more than one.
totals over name1
The totals will be a sub-query that is joined to, for delivering the desired result set.
Example (SQL)
data have;
length no 8 name1 $6 name2 $35 sales inv comp 8;
input
no name1& name2& sales inv comp; datalines;
1 TC Tribal Council Inc 100 100 0 * name1=TC group
2 TC Tribal Council Limited INC 20 25 65
3 TC Tribal council co 0 0 0
4 TC The Tribal council Assoctn 10 10 10
7 LS Longshore association 10 10 0 * name=LS group
8 LS The Longshore Group, LLC 2 4 8
9 LS The Longshore Group, llc 15 15 6
run;
proc sql;
create table want as
select
first_longest_name2.no,
first_longest_name2.name1,
first_longest_name2.name2,
name1_totals.sales,
name1_totals.inv,
name1_totals.comp
FROM
(
select
no, name1, name2
from
( select
no, name1, name2
from have
group by name1
having length(name2) = max(length(name2))
) longest_name2s
group by name1
having no = min(no)
) as
first_longest_name2
LEFT JOIN
(
select
name1,
sum(sales) as sales,
sum(inv) as inv,
sum(comp) as comp
from
have
group by name1
) as
name1_totals
ON
first_longest_name2.name1 = name1_totals.name1
;
quit;
Example (DATA Step)
Processing the data in a serial manner, when name1 groups are contiguous rows, can be accomplished using a DOW loop technique -- that is a loop with a SET statement within it.
data want2;
do until (last.name1);
set have;
by name1 notsorted;
if length(name2) > longest then do;
longest = length(name2);
no_at_longest = no;
name2_at_longest = name2;
end;
sales_sum = sum(sales_sum,sales);
inv_sum = sum(inv_sum,inv);
comp_sum = sum(comp_sum,comp);
end;
drop name2 no sales inv comp longest;
rename
no_at_longest = no
name2_at_longest = name2
sales_sum = sales
inv_sum = inv
comp_sum = comp
;
run;

SELECT MAX PARTITION TABLE

I have a table with partition on date(transaction_time), And I have a
problem with a select MAX.
I'm trying to get the row with the highest timestamp if I get more then 1 row in the result on one ID.
Example of data:
1. ID = 1 , Transaction_time = "2018-12-10 12:00:00"
2. ID = 1 , Transaction_time = "2018-12-09 12:00:00"
3. ID = 2 , Transaction_time = "2018-12-10 12:00:00"
4. ID = 2 , Transaction_time = "2018-12-09 12:00:00"
Result that I want:
1. ID = 1 , Transaction_time = "2018-12-10 12:00:00"
2. ID = 2 , Transaction_time = "2018-12-10 12:00:00"
This is my query
SELECT ID, TRANSACTION_TIME FROM `table1` AS T1
WHERE TRANSACTION_TIME = (SELECT MAX(TRANSACTION_TIME)
FROM `table1` AS T2
WHERE T2.ID = T1.ID )
The error I receive:
Error: Cannot query over table 'table1' without a filter over
column(s) 'TRANSACTION_TIME' that can be used for partition
elimination
It looks like BigQuery does not the correlated subquery in the WHERE clause. I don't know how to fix your current approach, but you might be able to just use ROW_NUMBER here:
SELECT t.ID, t.TRANSACTION_TIME
FROM
(
SELECT ID, TRANSACTION_TIME,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY TRANSACTION_TIME DESC) rn
FROM table1
) t
WHERE rn = 1;
can be done this way:
SELECT id, MAX(transaction_time) FROM `table1` GROUP BY id;

Django aggregrate/annotate with additional join

I have two model classes: Class A which represents table_a and Class B which represents table_b. Is there a way to aggregate or annotate Class A to include the value of a field in Class B other than the column that is aggregated on?
The SQL query that would accomplish what I need is as follows:
SELECT
a.*,
b2.col2
FROM table_a AS a
LEFT JOIN (SELECT
max(col1) AS m,
a_id
FROM table_b
GROUP BY a_id) AS b1 ON b1.a_id = a.id
LEFT JOIN table_b AS b2 ON b1.a_id = b2.a_id AND b2.col1 = b1.m
But the aggregate function only returns the data that would be retrieved with this:
SELECT
a.*,
b1.m
FROM table_a AS a
LEFT JOIN (SELECT
max(col1) AS m,
a_id
FROM table_b
GROUP BY a_id) AS b1 ON b1.a_id = a.id

Aggregating Over Actual Year in SAS

Lets suppose we have the following table ("Purchases"):
Date Units_Sold Brand Year
18/03/2010 5 A 2010
12/04/2010 2 A 2010
22/05/2010 1 A 2010
25/05/2010 7 A 2010
11/08/2011 5 A 2011
12/07/2010 2 B 2010
22/10/2010 1 B 2010
05/05/2011 7 B 2011
And the same logic continues until the end of 2014, for different brands.
What I want to do is calculate the number of Units_Sold for every Brand, in each year. However, I don't want to do it for the calendar year, but for the actual year.
So an example of what I don't want:
proc sql;
create table Dont_Want as
select Year, Brand, sum(Units_Sold) as Unit_per_Year
from Purchases
group by Year, Brand;
quit;
The above logic is ok if we know that e.g. Brand "A" exists throughout the whole 2010. But if Brand "A" appeared on 18/03/2010 for the first time, and exists until now, then a comparison of Years 2010 and 2011 would not be good enough as for 2010 we are "lacking" 3 months.
So what I want to do is calculate:
for A: the sum from 18/03/2010 until 17/03/2011, then from 18/03/2011 until 17/03/2012, etc.
for B: the sum from 12/07/2010 until 11/07/2011, etc.
and so on for all Brands.
Is there a smart way of doing this?
Step 1: Make sure your dataset is sorted or indexed by Brand and Date
proc sort data=want;
by brand date;
run;
Step 2: Calculate the start/end dates for each product
The idea behind the below code:
We know that the first occurrence of the brand in the sorted dataset is the day in which the brand was introduced. We'll call this Product_Year_Start.
The intnx function can be used to increment that date by 365 days, then subtract 1 from it. Let's call this date Product_Year_End.
Since we now know the product's year end date, we know that if the date on any given row exceeds the product's year end date, we have started the next product year. We'll just take the calculated Product_Year_End and Product_Year_Start for that brand and bump them up by one year.
This is all achieved using by-group processing and the retain statement.
data Comparison_Dates;
set have;
by brand date;
retain Product_Year_Start Product_Year_End;
if(first.brand) then do;
Product_Year_Start = date;
Product_Year_End = intnx('year', date, 1, 'S') - 1;
end;
if(Date > Product_Year_End) then do;
Product_Year_Start = intnx('year', Product_Year_Start, 1, 'S');
Product_Year_End = intnx('year', Product_Year_End, 1, 'S');
end;
format Product_Year_Start Product_Year_End date9.;
run;
Step 3: Using the original SQL code, group instead by the new product start/end dates
proc sql;
create table want as
select catt(year(Product_Year_Start), '-', year(Product_Year_End) ) as Product_Year
, Brand
, sum(Units_Sold) as Unit_per_Year
from Comparison_Dates
group by Brand, calculated Product_Year
order by Brand, calculated Product_Year;
quit;
The following code is doing what you ask in a literal sense, for the earliest 'date' of each 'brand', it start aggregating 'unitssold', when hits 365 days mark, it resets count, and starts another cycle.
data have;
informat date ddmmyy10.;
input date units_sold brand $ year;
format date date9.;
cards;
18/03/2010 5 A 2010
12/04/2010 2 A 2010
22/05/2010 1 A 2010
25/05/2010 7 A 2010
11/08/2011 5 A 2011
12/07/2010 2 B 2010
22/10/2010 1 B 2010
05/05/2011 7 B 2011
;
proc sort data=have;
by brand date;
run;
data want;
do until (last.brand);
set have;
by brand date;
if first.brand then
do;
Sales_Over_365=0;
_end=intnx('day',date,365);
end;
if date <= _end then
Sales_Over_365+units_sold;
else
do;
output;
Sales_Over_365=units_sold;
_end=intnx('day',date,365);
end;
end;
output;
drop _end;
run;
You need to have a start date for each brand. For now we can use the first sale date, but that might not be what you want. Then you can classify each sales date into which year it is for that brand.
Let's start by creating a dataset from your sample data. The YEAR variable is not needed.
data have ;
input Date Units_Sold Brand $ Year ;
informat date ddmmyy10.;
format date yymmdd10.;
cards;
18/03/2010 5 A 2010
12/04/2010 2 A 2010
22/05/2010 1 A 2010
25/05/2010 7 A 2010
11/08/2011 5 A 2011
12/07/2010 2 B 2010
22/10/2010 1 B 2010
05/05/2011 7 B 2011
;;;;
Now we can get the answer you want with an SQL query.
proc sql ;
create table want as
select brand
, start_date
, 1+floor((date - start_date)/365) as sales_year
, intnx('year',start_date,calculated sales_year -1,'same')
as start_sales_year format=yymmdd10.
, sum(units_sold) as total_units_sold
from
( select brand
, min(date) as start_date format=yymmdd10.
, date
, units_sold
from have
group by 1
)
group by 1,2,3,4
;
quit;
This will produce this result:
total_
sales_ start_ units_
Brand start_date year sales_year sold
A 2010-03-18 1 2010-03-18 15
A 2010-03-18 2 2011-03-18 5
B 2010-07-12 1 2010-07-12 10
There is no straight forward way of doing it. You can do something like this.
To test the code, I saved your table in to a text file.
Then I created a class called Sale.
public class Sale
{
public DateTime Date { get; set; }
public int UnitsSold { get; set; }
public string Brand { get; set; }
public int Year { get; set; }
}
Then I populated a List<Sale> using the saved text file.
var lines = File.ReadAllLines(#"C:\Users\kosala\Documents\data.text");
var validLines = lines.Where(l => !l.Contains("Date")).ToList();//remove the first line.
List<Sale> sales = validLines.Select(l => new Sale()
{
Date = DateTime.Parse(l.Substring(0,10)),
UnitsSold = int.Parse(l.Substring(26,5)),
Brand = l.Substring(46,1),
Year = int.Parse(l.Substring(56,4)),
}).ToList();
//All the above code is for testing purposes. The actual code starts from here.
var totalUnitsSold = sales.OrderBy(s => s.Date).GroupBy(s => s.Brand);
foreach (var soldUnit in totalUnitsSold)
{
DateTime? minDate = null;
DateTime? maxDate = null;
int total = 0;
string brand = "";
foreach (var sale in soldUnit)
{
brand = sale.Brand;
if (minDate == null)
{
minDate = sale.Date;
}
if ((sale.Date - minDate).Value.Days <= 365)
{
maxDate = sale.Date;
total += sale.UnitsSold;
}
else
{
break;
}
}
Console.WriteLine("Brand : {0} UnitsSold Between {1} - {2} is {3}",brand, minDate.Value, maxDate.Value, total);
}