I have a data set that looks like this:
id A
1 5
1 5
1 .
1 5
5 .
5 .
5 8
13 .
13 .
13 .
13 .
I want to calculate the number of A values when at least one A value is not missing in that panel in Stata.
For example, in the example above there are 3 missing values that are not the only missing value in that panel.
There is one missing A value when id is 1 and as there also are non-missing A values when id=1, I want to count that one.
Similarly, there are two missing A values when id is 5 and as there are also non-missing values when id=5, I want to count those two too.
There are 4 missing A values when id=13 but as there are no non-missing values when id=13, I don't want to count these.
I can't follow this, but the number of observations in each panel is
bysort id : gen count = _N
and the number of non-missing values of A is
by id : egen A_nm = count(A)
from which the missing values can be counted by subtraction. Alternatively, missing values can be counted directly by
by id: egen A_m = total(missing(A))
If that doesn't help, you may have to expand your question by showing what the new variable(s) you want looks like.
EDIT What you want may be just an application of this: you want to look at A_m values conditional on A_nm being positive.
Related
I have a data set that looks like this:
id firm earnings A
1 A 100 0
1 A 200 0
2 B 50 1
2 B 70 1
3 C 900 0
bys id firm, I want to keep only the first observation if A==0 and want to keep all the observations if A==1.
I've tried the following code:
if A==0{
bys id firm: keep if _n==1
}
However, this code drops all the _n>1 observations no matter what the A value is.
The if (conditional) {do something} syntax is used in control flow rather than in defining variables. As you have it now Stata is only testing if A==1 in the first row. Try adding additional conditions using and (&) or or (|) statements. Try this:
bys id firm: keep if (_n==1 & A==0) | A==1
Hi I am trying to merge two tables the FormA scores table that I made that is now CalculatingScores with the domain number found in DomainsFormA. I need to merge them by QuestionNum. Here is my code.
proc sql;
create table combined as
select *
from CalculatingScores inner join DomainsFormA
on CalculatingScores.Scores=DomainsFormA.QuestionNum;
quit;
proc print data=combined (obs=15);
run;
This table is what I am trying to get my merged tables to look like but for 15 observations.
Form
Student
QuestionNum
Scores
DomainNum
A
1
1
0
5
A
1
2
1
4
A
1
3
0
5
But My tables look more like this
Form
Student
QuestionNum
Scores
DomainNum
A
1
2
1
5
A
1
4
1
5
A
1
5
1
5
My entire Scores column for these 15 observations have a value of 1. Also my DomainNum column only has values of 5. My Student and Form columns are correct but I need to have varied scores and varied domain numbers. Any ideas for how to solve my problem? Maybe I need a order by statement?
You appear to be joining on the incorrect columns
You coded
on CalculatingScores.Scores=DomainsFormA.QuestionNum
which is joining a score to a question number
perhaps you should be coding
on CalculatingScores.QuestionNum=DomainsFormA.QuestionNum
^^^^^^^^^^^ ^^^^^^^^^^^
I'm trying to set up an array formula in a google sheet to save filling in a simple formula for ID#s.
The sheet is populated by a google form, so it receives a timestamp. Let's say these are orders.
If the month of the order matches that of the previous I want to increase the ID# by one, essentially counting this months orders. The complete ID# is actually made up of several factors, the order count being just one of them (so that they are unique), but for the sake of this exercise, I'll keep it simple.
If the month of the order does not match the previous, then safe to say we've entered the new month and the ID should restart at 01.
I have a column that has the extracted month from the timestamp. So the data looks like this:
A B
ID# MONTH
1 1
2 1
3 1
4 1
5 1
6 1
1 2
2 2
3 2
1 3
2 3
3 3
4 3
I can't get the arrayformula to work! I've tried numerous countIfs and Ifs, something like
=ARRAYFORMULA(if(len(B2:B),if(B3:B<>B2:B,1,A2:A+1),""))
Does anyone have any suggestions for this?
I found it hard to Google for and have tried a few search terms!
try:
=ARRAYFORMULA(IF(B1:B<>"", COUNTIFS(B1:B, B1:B, ROW(B1:B), "<="&ROW(B1:B)), ))
I have a table which is like this:
Geo_Key Var1 Var2..Var50
123 1 0 .. 1
524 0 1 .. 1
323 1 1 .. 1
Where Var1-Var50 represents 50 columns having value 1/0.
I want to select count of distinct Geo_Key for each column(var1-var50), when its value is=1.
So Results would be like:
Var1 50
Var2 60
....
...
Var50 10
Since your variables are binary( especially 0/1) in nature, you can also try summing each column up. The sum would give you the count of each variable with value = 1.
Or, you can try it using proc freq. Pleae check the following link
http://www2.sas.com/proceedings/sugi25/25/btu/25p069.pdf
I have three different questions about modifying a dataset in SAS. My data contains: the day and the specific number belonging to the tag which was registred by an antenna on a specific day.
I have three separate questions:
1) The tag numbers are continuous and range from 1 to 560. Can I easily add numbers within this range which have not been registred on a specific day. So, if 160-280 is not registered for 23-May and 40-190 for 24-May to add these non-registered numbers only for that specific day? (The non registered numbers are much more scattered and for a dataset encompassing a few weeks to much to do by hand).
2) Furthermore, I want to make a new variable saying a tag has been registered (1) or not (0). Would it work to make this variable and set it to 1, then add the missing variables and (assuming the new variable is not set for the new number) set the missing values to 0.
3) the last question would be in regard to the format of the registered numbers which is along the line of 528 000000000400 and 000 000000000054. I am only interested in the last three digits of the number and want to remove the others. If I could add the missing numbers I could make a new variable after the data has been sorted by date and the original transponder code but otherwise what would you suggest?
I would love some suggestions and thank you in advance.
I am inventing some data here, I hope I got your questions right.
data chickens;
do tag=1 to 560;
output;
end;
run;
data registered;
input date mmddyy8. antenna tag;
format date date7.;
datalines;
01012014 1 1
01012014 1 2
01012014 1 6
01012014 1 8
01022014 1 1
01022014 1 2
01022014 1 7
01022014 1 9
01012014 2 2
01012014 2 3
01012014 2 4
01012014 2 7
01022014 2 4
01022014 2 5
01022014 2 8
01022014 2 9
;
run;
proc sql;
create table dates as
select distinct date, antenna
from registered;
create table DatesChickens as
select date, antenna, tag
from dates, chickens
order by date, antenna, tag;
quit;
proc sort data=registered;
by date antenna tag;
run;
data registered;
merge registered(in=INR) DatesChickens;
by date antenna tag;
Registered=INR;
run;
data registeredNumbers;
input Numbers $16.;
datalines;
528 000000000400
000 000000000054
;
run;
data registeredNumbers;
set registeredNumbers;
NewNumbers=substr(Numbers,14);
run;
I do not know SAS, but here is how I would do it in SQL - may give you an idea of how to start.
1 - Birds that have not registered through pophole that day
SELECT b.BirdId
FROM Birds b
WHERE NOT EXISTS
(SELECT 1 FROM Pophole_Visits p WHERE b.BirdId = p.BirdId AND p.date = ????)
2 - Birds registered through pophole
If you have a dataset with pophole data you can query that to find if a bird has been through. What would you flag be doing - finding a bird that has never been through any popholes? Looking for dodgy sensor tags or dead birds?
3 - Data code
You might have more joy with the SUBSTRING function
Good luck