Teradata : Complex case statement - if-statement

I have a emp_id with different statuses. Based on those status , I need to retrieve the values. For example,
Employee Table
Emp_id ******** Status1 ******** Status2**********desc********judgement
****** ******
A1 ------------ Original ---------- Original---------fail--------pass
A1 ------------ Duplicate---------- Duplicate-------fail----------fail
B1 ------------ Original------------Original--------correct--------pass
B1 ------------ CORRECTION----------Original--------correct-----correct
Expected output:
Emp_id ******** Status1 ******** Status2***********FinalAnswer
****** ******
A1 ------------ Original ---------- Original-----------Pass
A1 ------------ Duplicate---------- Duplicate----------fail
B1 ------------ CORRECTION----------Original---------correct
There are 2 points here:
When for A1, if status1 and status2 are Original & duplicate, I need to see both records with the value coming from 'judgment' column.
2)When for B1, if status1 and status2 are Original & CORRECTION, I need to see only one record with the value coming from 'desc' column.
I did write a CASE but it is not working as expected.
select
CASE ( when status1=trim('Original') and status2=trim('Original')
Or status1=trim('Duplicate') and status2=trim('Duplicate') ) then Judgment
from employee;
Am seeing the below output from my case stmt which is not completely correct:
Emp_id ******** Status1 ******** Status2*******FinalAnswer
A1 ------------ Original ---------- Original-----------Pass
A1 ------------ Duplicate---------- Duplicate----------fail
When I'm trying to modify the case stmt with the B1 details... then am getting duplicates for the "Original" status from A1 .
Please help me.

Related

retaining the variable label in reshape

I would like to retain the variable labels after reshaping the dataset from long to wide. I have a problem in inputting the dataset (I keyed them in Excel then imported).
clear
set obs 7
input id a1 a2 a3
"s001" "John" 23 "Primary"
"s002" "Mary" 32 "Secondary"
"s002" "Anna" 23 "Tertiary"
"s003" "Joseph" 34 "Secondary"
"s003" "Oganyo" 23 "Primary"
"s004" "Manyoya" 34 "Tertiary"
"s005" "Makbuti" 45 "Primary"
end
*======= Label the variables
label var a1 "partners name"
label var a2 "partners age"
label var a3 "partners education"
foreach variable of varlist a*{
local varlabel : variable label `variable'
di "`varlabel'"
bys id: gen index = _n
renvars a* , postfix(_)
reshape wide a*, i(id) j(index)
label var `variable'* "`varlabel'"
}
The code here is very confused and just won't work well without major surgery.
At the outset your input statement fails to declare string variables when needed.
The set obs 7 is compatible with an end to the input.
A big problem is that you are looping over a bunch of commands including reshape, but there is just one reshape to carry out.
Your code for saving variable labels needs to save then separately.
renvars is from the Stata Journal (2005), and doesn't seem needed here any way. As from Stata 12 (2011!) it should still work but is essentially superseded by extensions to rename.
Here's my best guess at the example you should have given and the code you need.
clear
input str4 id str7 a1 a2 str9 a3
"s001" "John" 23 "Primary"
"s002" "Mary" 32 "Secondary"
"s002" "Anna" 23 "Tertiary"
"s003" "Joseph" 34 "Secondary"
"s003" "Oganyo" 23 "Primary"
"s004" "Manyoya" 34 "Tertiary"
"s005" "Makbuti" 45 "Primary"
end
label var a1 "partner's name"
label var a2 "partner's age"
label var a3 "partner's education"
local j = 0
foreach variable of varlist a* {
local ++j
local varlabel`j' : variable label `variable'
di "`varlabel`j''"
}
bys id: gen index = _n
reshape wide a*, i(id) j(index)
forval j = 1/3 {
foreach v of var a`j'* {
label var `v' "`varlabel`j''"
}
}
list
describe
Here's the output of the last two commands.
. list
+------------------------------------------------------------+
| id a11 a21 a31 a12 a22 a32 |
|------------------------------------------------------------|
1. | s001 John 23 Primary . |
2. | s002 Mary 32 Secondary Anna 23 Tertiary |
3. | s003 Joseph 34 Secondary Oganyo 23 Primary |
4. | s004 Manyoya 34 Tertiary . |
5. | s005 Makbuti 45 Primary . |
+------------------------------------------------------------+
.
. describe
Contains data
Observations: 5
Variables: 7
------------------------------------------------------------------------------------------------------------------
Variable Storage Display Value
name type format label Variable label
------------------------------------------------------------------------------------------------------------------
id str4 %9s
a11 str7 %9s partner's name
a21 float %9.0g partner's age
a31 str9 %9s partner's education
a12 str7 %9s partner's name
a22 float %9.0g partner's age
a32 str9 %9s partner's education
------------------------------------------------------------------------------------------------------------------
Sorted by: id
Note: reshape wide makes most things more difficult in Stata.
Note: Please use dataex to create reproducible data examples in Stata.

Stata output table: Conditional symbols in estout

for simplification, let's assume following script to create a simple regression table:
sysuse auto
eststo clear
qui regress price weight mpg
esttab using "table.rtf", cells(t) mtitles onecell nogap ///
stats(N, labels("Observations")) label ///
compress replace
eststo clear
Output:
(1)
.
t
Weight (lbs.) 2.723238
Mileage (mpg) -.5746808
Constant .541018
Observations 74
Question:
Would it be possible to mark every t-value above 0.5 or below 0.5 with an asterisk? (= greater than absolute value 0.5)
Please note: In the specific application case, I can't work with given p-values, and need a custom solution that works with thresholds of t.
Desired outcome:
(1)
.
t
Weight (lbs.) 2.723238*
Mileage (mpg) -.5746808*
Constant .541018*
Observations 74
Crossposting can be found here:
Thank you for your help!
You cannot do this directly with estout but the following works all the same:
sysuse auto, clear
regress price weight mpg
quietly esttab, mtitles onecell nogap stats(N, labels("Observations")) label ///
compress replace star staraux
matrix A = r(coefs)
matrix A = A[1...,2]
svmat A
generate A2 = "*" if abs(A1) >= 0.5
generate A4 = string(A1) + A2
local names : rownames A
generate A3 = ""
forvalues i = 1 / `: word count `names'' {
replace A3 = `"`: word `i' of `names''"' in `i'
}
list A3 A4 if !missing(A3)
+---------------------+
| A3 A4 |
|---------------------|
1. | weight 2.723238* |
2. | mpg -.5746808* |
3. | _cons .541018* |
+---------------------+
preserve
keep if !missing(A3)
export delimited A3 A4 using table.txt, delimiter(" ") novarnames
restore
You will have to do some more gymnastics to get the variable labels etc.

SAS: displaying data in a particular way

(I find it hard to give a good descriptive title, so I'll just ask by means of an example.)
I have a data set like this:
|ID | A1 A2 A3 | B1 B2 B3 | C1 C2 C3 |
+---+----------+----------+----------+
| 1 | a aa aaa| b bb bbb| c cc ccc|
| 2 | (... some values, etc ...)
What I want to do is, given an "ID", make a table output with the values A1,A2,etc for that ID, something like this:
| | A's | B's | C's |
+---+-----+-----+-----+
| 1 | a | b | c |
| 2 | aa | bb | cc |
| 3 | aaa | bbb | ccc |
So, to recap: I want to pick a row, and output a table with certain variables displayed in columns. I've tried to wrap my mind around how proc tabulate works, but haven't managed to wrangle it into giving me what I want; it may be I'm barking up the wrong tree. Is there a way to do this?
I don't need this to return a data table, just some screen output.
You can reshape the data by creating a transposing view that operates on the three arrays in parallel. Proc REPORT or PRINT can then be used to generate the presentation output.
Sample Data
data have;
do id = 1 to 10;
array a a1-a3;
array b b1-b3;
array c c1-c3;
do i = 1 to dim(a);
a(i) = 10 ** i + id;
b(i) = 2 * 10 ** i + id;
c(i) = 3 * 10 ** i + id;
end;
output;
keep id a: b: c:;
end;
run;
Transposing view
data have_v / view = have_v;
set have;
array as a1-a3;
array bs b1-b3;
array cs c1-c3;
do seq = 1 to dim(as);
a = as(seq);
b = bs(seq);
c = cs(seq);
output;
end;
keep id seq a b c;
run;
Output with where clause. BY statement used to show id value in output.
proc report data=have_v;
by id;
where id = 3;
column id seq a b c;
define id / display noprint;
run;
You could use VIEWTABLE and issue a WHERE command if you don't want to produce output.
If each row encompasses an arbitrary number of 'arrays' (say a to z) of arbitrary but equal length (say 1 to 15), you would want to write a macro that performs some meta-data examination of the data set in question. The examination would attempt to discover the array 'names' and number of elements in each. This say would need to discover and output 15 rows by 26 columns for a given id.
Sounds like something that on old style data _null_ report could produce.
data _null_;
set have ;
where id=1 ;
array a a1-a3 ;
array b b1-b3 ;
array c c1-c3 ;
file print;
put #10 'A' #20 'B' #30 'C'
/ #10 8*'-' #20 8*'-' #30 8*'-'
;
do i=1 to dim(a);
put i 8. #10 a(i) #20 b(i) #30 c(i) ;
end;
run;
Results
A B C
-------- -------- --------
1 a b c
2 aa bb cc
3 aaa bbb ccc

Match and Get the Value from another table (Vlookup)

I have 2 tables connected to each other through col A. I want to match the column C with Col A and get the value of Col B.
For Example,
Table 1
ColA ColB Colc
a a1 b
a b1 c
c c1 a
Table2
ColA ColB
a a1
b b1
c c1
Now, I have already created relationships between Table2 and Table1 for my other calculations connecting both the tables with the colA.
Now, I am trying to match ColC from Table1 with ColA of Table2 and return the value of ColB from Table2 as MatchedOutput.
Expected output
Table1
ColA ColB Colc MatchedOutput
a a1 b b1
a b1 c c1
c c1 a a1
The DAX function for this is LOOKUPVALUE.
MatchedOutput = LOOKUPVALUE(Table2[ColB],Table2[ColA],Table1[ColC])
This looks for the value in Table2[ColB] where Table2[ColA] matches Table1[ColC].

SQL update and join three tables based on rows in one table and not another

I have a bit of a complicated sql query I need to do, and I'm a bit stuck. I'm using SQLite if that changes anything.
I have the following table structure:
Table G
---------
G_id (primary key) | Other cols ...
====================================
21
22
23
24
25
26
27 (no g_to_s_map)
28
.
Table S
---------
S_id (primary key) | S_num | Other cols.....
====================================
1 1101
2 1102
3 1103
4 1104
5 1105
6 1106
7 1107 (no g_to_s_map, no s_to_t_map)
8 1108 (no g_to_s_map, there IS an s_to_t_map)
9 1109 (there is an g_to_s_map, but no s_to_t map)
.
Table T
---------
T_id (primary key) | Other cols...
==================================
1
2
Then I also have two mapping tables:
Table G_to_S_Map (1:1 mapping, unique values of both g_id and s_id)
----------
G_id (foreign key ref g)| S_id (foreign key ref s)
===================================================
21 1
22 2
23 3
24 4
25 5
26 6
28 9
.
Table S_to_T_Map (many:1 mapping, many unique s_id to a t_id)
----------
S_id (foreign key ref s) | T_id (foreign key ref s)
===================================================
1 1
2 1
3 1
4 2
5 2
6 2
8 2
Given only a T_id and a G_id, I need to be able to update the G_to_S_Map with the first S_id corresponding to the specified T_id (in the S_to_T_Map) that is NOT in the G_to_S_Map
The first thing I was thinking of was just getting any S_id's that corresponded to the T_id in the S_to_T_Map:
SELECT S_id FROM S_to_T_Map where T_id = GIVEN_T_ID;
Then presumably I would join those values somehow with the G_to_S_Map using a left/right join maybe, and then look for the first value which doesn't exist on one of the sides? Then I'd need to do an insert into the G_to_S_Map based on that S_id and the GIVEN_G_ID value or something.
Any suggestions on how to go about this? Thanks!
Edit: Added some dummy data:
I believe this should work:
INSERT INTO G_To_S_Map (G_id, S_id)
(SELECT :inputGId, a.S_id
FROM S_To_T_Map as a
LEFT JOIN G_To_S_Map as b
ON b.S_id = a.S_id
AND b.G_id = :inputGId
WHERE a.T_id = :inputTId
AND b.G_id IS NULL
ORDER BY a.S_id
LIMIT 1);
EDIT:
If you're wanting to do the order by a different table, use this version:
INSERT INTO G_To_S_Map (G_id, S_id)
(SELECT :inputGId, a.S_id
FROM S_To_T_Map as a
JOIN S as b
ON b.S_id = a.S_id
LEFT JOIN G_To_S_Map as c
ON c.S_id = a.S_id
AND c.G_id = :inputGId
WHERE a.T_id = :inputTId
AND c.G_id IS NULL
ORDER BY b.S_num
LIMIT 1);
(As an aside, I really hope your tables aren't actually named like this, because that's a terrible thing to do. The use of Map, especially, should probably be avoided)
EDIT:
Here's some example test data. Have I missed something? Did I conceptualize the relationships incorrectly?
S_To_T_Map
================
S_ID T_ID
1 1
2 1
3 1
1 2
1 3
3 3
G_To_S_Map
==================
G_ID S_ID
1 1
3 1
2 1
3 2
2 3
3 3
Resulting joined data:
(CTEs used to generate cross-join test data)
Results:
=============================
G_TEST T_TEST S_ID
1 1 3
2 1 2
1 3 3
EDIT:
Ah, okay, now I get the problem. My issue was that I was assuming there was some sort of many-one relationship between S and G. As this is not the case, use this amended statement:
INSERT INTO G_To_S_Map (G_id, S_id)
(SELECT :inputGId, a.S_id
FROM S_To_T_Map as a
JOIN S as b
ON b.S_id = a.S_id
LEFT JOIN G_To_S_Map as c
ON c.S_id = a.S_id
OR c.G_id = :inputGId
WHERE a.T_id = :inputTId
AND c.G_id IS NULL
ORDER BY b.S_num
LIMIT 1);
Specficially, the line checking G_To_S_Map for a row containing the G_Id needed to be switched from using an AND to an OR - the key requirement which had not been specified previously was the fact that both G_Id and S_Id were unique in G_To_S_Map.
This statement will not insert a line if either the provided G_Id has been mapped previously, or if all S_Ids mapped to the given T_Id have been mapped.
Hmm, the following seems to work nicely, although I haven't combined an "insert" with it yet.
Select s.S_ID from S as s
inner join(
Select st.S_ID from s_to_t_map as st
where st.T_ID=???? AND not exists
(Select * from g_to_s_Map as gs where gs.S_ID = st.S_ID)
) rslt on s.S_ID=rslt.S_ID ORDER BY s.s_Num ASC limit 1;