what is this program doing exactly? (SAS) - sas

I was confused by the following SAS code. So, here, the SAS data set named WORK.SALARY contains 10 observations for each department,and is currently ordered by Department. The following SAS program is submitted:
data WORK.TOTAL;
set WORK.SALARY(keep=Department MonthlyWageRate);
by Department;
if First.Department=1 then Payroll=0;
Payroll+(MonthlyWageRate*12);
if Last.Department=1;
run;
So, what exactly is First.Department and Last.Department? Many thanks for your time and attention.

Your data step calculates the total PAYROLL for each DEPARTMENT.
The FIRST. and LAST. variables are generated automatically when you use a BY statement. They are true when the current observation is the first (or last) observation in the BY group. How the DATA Step Identifies BY Groups
The sum statement (Syntax: var+expression;) for PAYROLL means that the value of PAYROLL is retained (or carried over) to the next observation.
The IF/THEN statement will initializes the value to zero when a new group starts.
The subsetting IF statement will make sure that only the final observation for each department is output.

As explained, it is calculating payroll for each department.
First.department assigns value =1 when a particular department id is encountered. last.department assigns a value =1 when the last record for the department is read.
So if you have :
Department Wage
1 100
1 200
1 300
2 1000
2 2000
2 3000
With the first. and last. assigned, it will look like this:
Department Wage first.deaprtment last.department
1 100 1 0
1 200 0 0
1 300 0 1
2 1000 1 0
2 2000 0 0
2 3000 0 1
Now you can follow your logic as to what happens when first.department = 1.
By the way, in your code, I dont see they are doing anything if Last.Department=1;

Related

Generate new variable in one dataset using observation from another dataset in SAS

I have two datasets, one with one observation and two variables. Other dataset with 10 observations, four variables.
Dataset 1
Final Result
X Fail
Dataset 2
A B C D Output
1 1 2 Pass
2 1 2 Pass
3 1 2 Pass
4 1 2 Fail
5 1 2 Pass
6 1 2 Fail
7 1 2 Pass
8 1 2 Fail
9 1 2 Pass
10 1 2 Pass
I would like to generate a fifth variable (output) in the second dataset depending on the value of the second variable in the first dataset.
If Result in first dataset equal to fail, generate a new variable output in the second dataset as fail. If Result in first dataset equal to pass, then generate a new variable output in the second dataset which will be equal to the value in column D of the second dataset.
Just use some simple IF/THEN logic. Since you know DATASET1 only has one observation then only read one observation from it.
data want;
if _n_=1 then set dataset1 ;
set dataset2 ;
length OUTPUT $4 ;
if RESULT='FAIL' then OUTPUT=RESULT;
else OUTPUT=D ;
run;

Changing organization of data so that each observation represents a new variable (I tried)

I am working in Stata with a dataset on electric vehicle charging stations. Variables include
station_name name of charging station
review_text all of the customer reviews for a specific station delimited by }{
num_reviews number of customer reviews.
I'm trying to make a new file where each observation represents one customer review in a new variable customer_review and another variable station_id has the name of the corresponding station. So, if the original dataset had 100 observations (one per station) with 5 reviews each, the new file should have 500 observations.
How can I do this? I would include some code I have tried but I have no idea how to start.
If your data look like this:
station reviews n
1. 1 {good}{bad}{great} 3
2. 2 {poor}{excellent} 2
Then the following:
split(reviews), parse(}{)
drop reviews n
reshape long reviews, i(station) j(review_num)
drop if reviews==""
replace reviews = subinstr(reviews, "}","",.)
replace reviews = subinstr(reviews, "{","",.)
will produce:
station review~m reviews
1. 1 1 good
2. 1 2 bad
3. 1 3 great
4. 2 1 poor
5. 2 2 excellent

Stata: Keep the first observation by group

I have a data set that looks like this:
id firm earnings A
1 A 100 0
1 A 200 0
2 B 50 1
2 B 70 1
3 C 900 0
bys id firm, I want to keep only the first observation if A==0 and want to keep all the observations if A==1.
I've tried the following code:
if A==0{
bys id firm: keep if _n==1
}
However, this code drops all the _n>1 observations no matter what the A value is.
The if (conditional) {do something} syntax is used in control flow rather than in defining variables. As you have it now Stata is only testing if A==1 in the first row. Try adding additional conditions using and (&) or or (|) statements. Try this:
bys id firm: keep if (_n==1 & A==0) | A==1

Creating indicator variable in Stata

In a panel data set I have 3 variables: name, week, and income.
I would like to make an indicator variable that indicates initial weeks where income is 0. So say a person X has 0 income in the first 13 weeks, the indicator takes the value 1 the first 13 weeks, and is otherwise 0. The same procedure for person Y and so on.
I have tried using by groups, but I can't get it to work.
Any suggestions?
One solution is
bysort name (week) : gen no_income = sum(income) == 0
The function sum() yields cumulative or running sum. So, as long as income is 0, its cumulative sum remains 0 too. As soon as a person earns something, the cumulative sum becomes positive. The code is based on the presumption that cumulative income can not cross zero again because in a given week, income is negative. To exclude that possibility use an appropriate extra condition, such as
bysort name (week) : gen no_income = sum(income) == 0 & income == 0
For a problem with very similar flavour, see this FAQ. A meta-lesson is to look at the StataCorp FAQs as one of several resources.

Modifying data in SAS: copying part of the value of a cell, adding missing data and labeling it

I have three different questions about modifying a dataset in SAS. My data contains: the day and the specific number belonging to the tag which was registred by an antenna on a specific day.
I have three separate questions:
1) The tag numbers are continuous and range from 1 to 560. Can I easily add numbers within this range which have not been registred on a specific day. So, if 160-280 is not registered for 23-May and 40-190 for 24-May to add these non-registered numbers only for that specific day? (The non registered numbers are much more scattered and for a dataset encompassing a few weeks to much to do by hand).
2) Furthermore, I want to make a new variable saying a tag has been registered (1) or not (0). Would it work to make this variable and set it to 1, then add the missing variables and (assuming the new variable is not set for the new number) set the missing values to 0.
3) the last question would be in regard to the format of the registered numbers which is along the line of 528 000000000400 and 000 000000000054. I am only interested in the last three digits of the number and want to remove the others. If I could add the missing numbers I could make a new variable after the data has been sorted by date and the original transponder code but otherwise what would you suggest?
I would love some suggestions and thank you in advance.
I am inventing some data here, I hope I got your questions right.
data chickens;
do tag=1 to 560;
output;
end;
run;
data registered;
input date mmddyy8. antenna tag;
format date date7.;
datalines;
01012014 1 1
01012014 1 2
01012014 1 6
01012014 1 8
01022014 1 1
01022014 1 2
01022014 1 7
01022014 1 9
01012014 2 2
01012014 2 3
01012014 2 4
01012014 2 7
01022014 2 4
01022014 2 5
01022014 2 8
01022014 2 9
;
run;
proc sql;
create table dates as
select distinct date, antenna
from registered;
create table DatesChickens as
select date, antenna, tag
from dates, chickens
order by date, antenna, tag;
quit;
proc sort data=registered;
by date antenna tag;
run;
data registered;
merge registered(in=INR) DatesChickens;
by date antenna tag;
Registered=INR;
run;
data registeredNumbers;
input Numbers $16.;
datalines;
528 000000000400
000 000000000054
;
run;
data registeredNumbers;
set registeredNumbers;
NewNumbers=substr(Numbers,14);
run;
I do not know SAS, but here is how I would do it in SQL - may give you an idea of how to start.
1 - Birds that have not registered through pophole that day
SELECT b.BirdId
FROM Birds b
WHERE NOT EXISTS
(SELECT 1 FROM Pophole_Visits p WHERE b.BirdId = p.BirdId AND p.date = ????)
2 - Birds registered through pophole
If you have a dataset with pophole data you can query that to find if a bird has been through. What would you flag be doing - finding a bird that has never been through any popholes? Looking for dodgy sensor tags or dead birds?
3 - Data code
You might have more joy with the SUBSTRING function
Good luck