Update a table with an aggregate sum from another - sql-update

I have tables A and B
A B
ts ta ts ta
1 0,00 1 10,00
2 0,00 1 5,00
3 0,00 1 6,00
4 0,00 2 3,00
2 5,00
2 10,00
3 5,00
And I want to update table A to get this result:
A
ts ta
1 21,00
2 18,00
3 5,00
So far I've tried with this query:
update A
set A.ta = C.sta
from (SELECT SUM(B.ta) sta
FROM B INNER JOIN A ON B.ts = A.ts
GROUP BY B.ts) C
and get this unwanted result:
ts ta
1 21,00
2 5,00
3 21,00

UPDATE A
SET ta = (SELECT sum(ta) FROM B WHERE ts = A.ts)
FROM A

Related

Left join PROC SQL using threshold date

I am hoping you can help me! Please help!!!!
I am in SAS using PROC SQL and I have datasets A and B with different measurements (relating to patient's health) as follows:
Dataset A
ID Date measurement_a
1 20JUN2013 52.3
1 12JUL2013 65.6
1 28NOV2014 37.4
1 02DEC2014 61.3
1 22SEP2015 40.5
1 15OCT2015 60.5
2 03JUN2011 46.5
2 19JUL2011 54.1
2 29OCT2012 53.6
...
Dataset B
ID Date measurement_b
1 21MAR2007 43
1 13JUL2007 45
1 07APR2009 47
1 14MAY2009 46
1 16FEB2012 42
1 27AUG2012 53
1 12DEC2012 58
1 20JUN2013 56
1 10DEC2013 53
1 23MAY2014 49
1 17SEP2014 44
1 23SEP2015 40
2 16DEC2011 58
2 22AUG2012 54
2 20FEB2013 56
2 29MAY2013 53
...
What I am looking for is that if the date in Dataset B is within 6 months of the date in Dataset a, then a new variable called "time" will be added, saying 1,2,3,etc. for how many ever match with ** only measurement_a** length (in other words, I do not need to retain values of measurement_b if it does not match the date in Dataset a. Here is an example of what I mean:
Desired result/dataset:
ID Time measurement_a measurement_b
1 1 52.3 56 (Dataset B Date = 20JUN2013 - Matched exactly)
1 2 65.6 53 (Dataset B date = 10DEC2013 - Within six months of 12JUL2013 [Dataset A Date])
1 3 37.4 44 (Dataset B date = 17SEP2014 - Within six months of 28NOV2014 [Dataset A Date])
1 4 61.3 . (because 17SEP2014 [Dataset B] is closest to 28NOV2014 [Dataset A])
1 5 40.5 40 (because 23SEP2015 [Dataset B] is closest to 22SEP2015 [Dataset A])
1 6 60.5 . (No date in Dataset B that is within 6 months of Date in Dataset A [15OCT2015])
2 1 46.5 . (See below)
2 2 54.1 58 (because 03JUL2011 [Dataset B] is closest to 19JUL2011 [Dataset A])
2 3 53.6 54 (Dataset B date = 22AUG2012 - Within 6 months of Dataset A date = 29OCT2012)
...
I have joined on ID but the times is proving difficult. I know it could be the difference in months in the "where" statement in the following code:
PROC SQL;
CREATE TABLE join_test as
SELECT * FROM data_a as a
LEFT_JOIN data_b as b
ON a.id = b.id
WHERE days(a.Date - b.Date) <= 180 ;
QUIT;
But this does not do the trick.
Can some please help me?
I really appreciate it. Thanks in advance.
In the join criteria add the use of the SAS function INTCK to compute the number of month intervals between the two date values. Proc SQL does not have a way to introduce a serial count value, so you will have to add that in a subsequent step. A LEFT JOIN will create a result set with every id/date in table A.
Example:
The columns a.date, b_date and c_months_apart were added to show how the join works. You can safely remove them from the select.
proc sql;
create table stage1 as
select
a.id
, a.date
, a.measurement_a
, b.measurement_b
, b.date as b_date
, intck('month', a.date, b.date, 'C') as c_months_apart
from
a left join b
on a.id = b.id
and intck('month', a.date, b.date, 'C') between 0 and 6
order by a.id, a.date, b.date
;
data want;
set stage1;
by id;
if first.id then time=1; else time+1;
run;
Output (want)
measurement_ measurement_ c_months_
ID Date a b b_date apart time
1 20JUN2013 52.3 56 20JUN2013 0 1
1 20JUN2013 52.3 53 10DEC2013 5 2
1 12JUL2013 65.6 53 10DEC2013 4 3
1 28NOV2014 37.4 . . . 4
1 02DEC2014 61.3 . . . 5
1 22SEP2015 40.5 40 23SEP2015 0 6
1 15OCT2015 60.5 . . . 7
2 03JUN2011 46.5 58 16DEC2011 6 1
2 19JUL2011 54.1 58 16DEC2011 4 2
2 29OCT2012 53.6 56 20FEB2013 3 3

Return all values using LOOKUPVALUE, not just matches

I have two tables with related fields. I am trying to return the enrollment# from table A into a column in table B.
table A
Serial# Enrollment#
A 1
B 2
C 3
D 4
E 5
table B
Serial# Enrollment#
A 1
B 20
C 3
D 4
E 50
I want this calculated column in table B
Serial# Enrollment# tableAEnrollment#
A 1 1
B 20 2
C 3 3
D 4 4
E 50 5
however this is what I am getting:
Serial# Enrollment# tableAEnrollment#
A 1 1
B 20
C 3 3
D 4 4
E 50
my function is:
tableAEnrollemnt# = LOOKUPVALUE(A[Enrollment #], A[Serial #], B[Serial #])
Its only bringing back where enrollment numbers match. What am I doing wrong?
Thanks in advance!

SAS - Split single column into two based on value of an ID column

I have data which is as follows.
data have;
input group replicate $ sex $ count;
datalines;
1 A F 3
1 A M 2
1 B F 4
1 B M 2
1 C F 4
1 C M 5
2 A F 5
2 A M 4
2 B F 6
2 B M 3
2 C F 2
2 C M 2
3 A F 5
3 A M 1
3 B F 3
3 B M 4
3 C F 3
3 C M 1
;
run;
I want to break the count column into two separate columns based on gender.
count_ count_
Obs group replicate female male
1 1 A 3 2
2 1 B 4 2
3 1 C 4 5
4 2 A 5 4
5 2 B 6 3
6 2 C 2 2
7 3 A 5 1
8 3 B 3 4
9 3 C 3 1
This can be done by first creating two separate data sets for each level of sex and then performing a merge.
data just_female;
set have;
where sex = 'F';
rename count = count_female;
run;
data just_male;
set have;
where sex = 'M';
rename count = count_male;
run;
data want;
merge
just_female
just_male
;
by
group
replicate
;
keep
group
replicate
count_female
count_male
;
run;
Is there a less verbose way to do this which doesn't require the need to sort or explicitly drop/keep variables?
You can do this using proc transpose but you will need to sort the data. I believe this is what you're looking for though.
proc sort data=have;
by group replicate;
run;
The data is sorted so now you have your by-group for transposing.
proc transpose data=have out=want(drop=_name_) prefix=count_;
by group replicate;
id sex;
var count;
run;
proc print data=want;
Then you get:
Obs group replicate count_F count_M
1 1 A 3 2
2 1 B 4 2
3 1 C 4 5
4 2 A 5 4
5 2 B 6 3
6 2 C 2 2
7 3 A 5 1
8 3 B 3 4
9 3 C 3 1

Putting same income for same groupID

In my data, income was asked only to one person of the group.
householdID memberID income
1 1 4
2 2 .
1 2 .
2 3 .
2 1 3
But obviously, I need to fill them up like
householdID memberID income
1 1 4
2 2 3
1 2 4
2 3 3
2 1 3
How can I do this in Stata?
This is an elementary application of by:
bysort householdID (income) : replace income = income[1] if missing(income)
See for related material this FAQ
A more circumspect approach would check that at most one non-missing value has been supplied for each household:
bysort householdID (income) : gen OK = missing(income) | (income == income[1])
list if !OK

Pandas: Combining and summing rows based on values from other rows

In a Panda's data frame, I'd like combine all 'other' rows from col_2 into a one row for each value from col_1 by assigning col_3 the sum of all corresponding values.
EDIT - Clarification: In total, I have about 20 columns (where values in those columns is unique for each col_1. there however 80,000 other fields; however, there are three columns affecting my question
Current dataframe df:
col_1 col_2 col_3
1 a 30
1 b 25
1 other 1
1 other 5
2 a 321
2 b 1
2 other 45
2 other 52
2 other 17
2 other 8
Desired resultin :
col_1 col_2 col_3
1 a 30
1 b 25
1 other 6
2 a 321
2 b 1
2 other 122
How can I do this in Pandas?
You can groupby on col_1 and col_2 and call sum and then reset_index:
In [188]:
df.groupby(['col_1','col_2']).sum().reset_index()
Out[188]:
col_1 col_2 col_3
0 1 a 30
1 1 b 25
2 1 other 6
3 2 a 321
4 2 b 1
5 2 other 122