Changing single values of variables - stata

I can't seem to find a way of changing individual values in Stata.. Say if I have a variable called height which has 20 observations, I can
dis height[20] /*displays the 20th observation of height*/
How can I likewise change say the 20th observation?

You could use the Data Editor. Otherwise the command line syntax is replace ... in #. See the help for replace. If you keep a log either kind of change will be documented as a replace statement.
clear
set obs 10
gen y = _n
replace y = 42 in 7

Related

How count an variable's frequency in specific column in sas

I would like to creare a variable along with each subject id, variable is ci_em_ti = COUNT “Impaired” values among the following variables: bvmdrt_cutoff, craftivmmt_cutoff, craftpimmt_cutoff, craftvdelt_cutoff, craftpdelt_cutoff, nlairt_cutoff, nlsdt_cutoff, nlldt_cutoff
How should I do this in SAS?
I tried
countc(cats(of bvmdrt_cutoff, craftivmmt_cutoff, craftpimmt_cutoff, craftvdelt_cutoff, craftpdelt_cutoff, nlairt_cutoff, nlsdt_cutoff), "Impaired")`
but it done not work
The function COUNTC() counts the number of times any of the listed characters appear. By searching for Impaired you are searching for the characters: adeiImpr. So one value of "Missing" will contribute 2 into the count since it has two lowercase i's and "Normal" will count as 3 because the letters r,m and a. "Imparied" will count as 8 since all of the characters are in the search list.
The function COUNT() will search for the number of times a substring occurs so you might try that.
Are you sure your values are character strings? If instead they are numbers with a user defined format attached the CATS() function will not use the formatted values. So you will need to search for the codes instead of the decodes.
PS There is no need to add the OF keyword when there is only one variable in the list. Either remove the OF or remove the commas.
You say count in a column but then your function is actually counting for several columns but a single row. Since you haven't provided usable data, I'll use SASHELP.HEART instead.
This shows how to display your values in each column.
proc freq data=sashelp.heart;
table chol_status bp_status weight_status smoking_status;
run;

split single variable value in two

i have dataset a
data q7;
input trt$;
cards;
a150
b250
c300
400
abc180
;
run;
We have to create dataset b like this
trt dose
a150 150mg
b250 250mg
c300 300mg
400 400mg
abc180 180mg
new dose variable is added & mg is written after each
numeric values
here is my solution - Basically use the compress functions to keep (hence the 'k') only numbers from the trt variable. From there then is just the case of concatenating mg to numbers.
data want;
set q7;
dose = cats(compress(trt,'0123456789','k'),'mg');
run;
The compress function default behaviour is to return a character string with specified characters removed from the original string.
so
compress(trt,'0123456789') would have removed all numbers from the trt variable.
However compress comes with a battery of modifiers that let the user alter the default behaviour.
So in your case, we wanted to keep numbers regardless of the number of preceding letters so I used the modifier k to keep instead the list of characters in this case 012345679
For a full list of modifiers please read the following link
cats is one of the many functions SAS have to concatenate strings, so passing the compress argument as 1st string and mg as 2nd string will concatenate both to produce your desired result
hope it helps

How to fill in missing values by group?

I have the following data structure. Within each group, some observations have missing value. I do know that each group has only one non-missing value (10 for group 1 and 11 for group 2 in this case). The location of the missing observations are random within the group (i.e. can't fill in missing values with the previous / following value).
How to fill the missing values with the one non-missing value by group?
group value
1 .
1 10
1 .
2 11
2 .
2 11
My current solution is a loop, but I suspect there's some clever bysort that I can use.
levelsof group, local(lm_group)
foreach group in `lm_group' {
levelsof value if group == `group', local(lm_value)
replace value = `lm_value' if group == `group'
}
If you know that the non-missing values are constant within group, then you can get there in one with
bysort group (value) : replace value = value[_n-1] if missing(value)
as the missing values are first sorted to the end and then each missing value is replaced by the previous non-missing value. Replacement cascades downwards, but only within each group.
For documentation, see this FAQ
To check that there is at most one distinct non-missing value within each group, you could do this:
bysort group (value) : assert (value == value[1]) | missing(value)
More personal note. It's nice to see levelsof in use, as I first wrote it, but the above is better.
I think the xfill command is what you are looking for.
To install xfill, copy-paste the following into Stata and follow instructions:
net from http://www.sealedenvelope.com/
After that, the rest is easy:
xfill value, i(group)
You can read up about xfill here
The clever bysort-answer you were looking for was:
bysort group: egen new_value=max(cond(!missing(value), value, .)
The cond-function checks if the first argument is true and returns value if is and . if it is not.
FWIW I could not get Nick's bysort solution to work, no clue why. I followed the suggested syntax from the FAQ he linked instead and got it to work, though. The generic form is:
gsort id -myvar
by id: replace myvar = myvar[_n-1] if myvar == .
EDIT: fixed the errant reference to "time" in the previous iteration of this post (and added the if missing condition). The current code should be a functioning generic solution.

SAS Converting Characters/Number to Numbers

I am looking for a way to convert the characters into numbers in SAS so that I can use the max function. Also, it would be helpful if the characters and only the numbers are kept. Below is a list of data for a column in a SAS table.
Column UNK
abc20140714
abc20140714x
abc20140714xyz
123_abc20140714_xyz
abc20150718
After stripping out the number values from the column, I would then group the data and use the max function in SAS, which should only generate the value 20150718.
To avoid any confusion, my question, is there a way to strip out the non-numeric values, and then convert the column into a numeric column so I can use the max function?
Thanks.
Sure!
var_num = input(compress(var_char,,'kd'),yymmdd8.);
Compress removes or keeps characters from a list. 'kd' says to 'keep digits'.
You then input using the appropriate informat; yymmdd8. looks right based on the data you provide. Then apply a format, format var_num yymmdd8n.; or similar, so it looks like a date visually (even if it's really a number underneath).
As pointed out, this won't work if there are other numeric digits in the values; you need to look at your data and identify how those appear and clean them out separately. You could use a regular expression for example to identify things that have 8 consecutive digits, starting with a 20; but ultimately it is a data analysis issue to handle these as your data require.
To get the first sequence of 8 digits in a row starting with a 1 or a 2 as a numeric value, you can use the following:
data want;
set have;
pos = prxmatch("/[12]\d{7}/", character_string);
if pos > 0 then number = input(substr(character_string, pos, 8), 8.);
else number = .;
drop pos;
run;
The prxmatch expression finds the starting position of the sequence, and the substr expression extracts the sequence, then the input function converts it to a numeric.
(Edited to incorporate Joe's feedback)

how many values are missing for each observation

I have data as follows:
ID date shoesize shoetype
1 4/3/12 . bball
2 . 12 running
3 1/2/12 8 .
4 . 9.5 bball
I want to count the number of '.' there are in each row and make a frequency table with the information. Thanks in advance
You can determine the number of missing values in a row with the NMISS and CMISS functions (NMISS for numeric, CMISS for character). If you have a list of just some of your variables, you should use that list; if not, you need to deal with the fact that number_missing itself will be missing (the -1 there).
data want;
set have;
number_missing=nmiss(of _numeric_) + cmiss(of _character_)-1;
run;
Then do whatever you want with that new variable.
NMISS doesn't work if you wish to evaluate character variables. It converts character variables in the list of arguments to numeric which results in a count being made of missing in every instance that a character variable is encountered. CMISS doesn't convert character variable values to missing and therefore you get the correct answer.
Obviously you can choose not to include the character variables as your arguments, however I am assuming that you want to count missing values in character variables as well, based on the sample you provided. If this is the case the following should do what you want.
DATA WANT3;
SET HAVE;
NUMBER_MISSING = 0;
NUMBER_MISSING=CMISS(OF _ALL_);
RUN;
You must allocate a value to NUMBER_MISSING, otherwise the new variable is also evaluated as a missing.