Scalar formatting problem with number of digits - stata

I am trying to write a command that ereturns a scalar that is percentage rounded to 2 decimal places. The percentage can be negative or positive, with unknown number of digits before the decimal point.
Here's MRE that shows the problem I am having.
#delimit;
capture program drop my_note;
program my_note, eclass;
local my_x: display %-9.2f 92.23999999999999;
ereturn scalar my_x = `my_x';
end;
ereturn clear;
my_note;
ereturn list;
display %-9.2f 92.23999999999999;
display 92.23999999999999;
I am puzzled why display seems to do the right thing (turn 92.23999999999999 to 92.24, though regardless of format), but that e(my_x) does not seem to inherit that format.

When you create your scalar you copy the value of the local `my_x'. That value is still 92.23999999999999 as : display is not changing the underlying data, only how it is displayed. Think of it as the data in 5.00e+2 and 500 is the same, it is just how that value is shown that differs.
You need to use strings to work with how the value is displayed. However, there are two issues with strings in your code example.
While scalars normally can hold both strings and numeric values, returned scalars can not hold strings (don't ask me why). Would it be possible to return a local instead?
In %-9.2f you specify that the display format to be 9 characters long so your scalar will be e(my_x) : "92.24 ". You can adjust %-9.2f, but since you are now working with strings you can remove excessive spaces using the trim() function.
Try the code below and see if that works given the context of this function. If not tell us more about what you are about to do.
#delimit;
capture program drop my_note;
program my_note, eclass;
local my_x: display %-9.2f 92.23999999999999;
ereturn local my_x = trim("`my_x'");
end;
ereturn clear;
my_note;
ereturn list;

Related

Attempting to identify non-integer values using for loop

I am trying to identify values that are not integers in Stata. My dataset is the following:
var1 var2 var3
1 2 3
2 4 5
3 6 7
4 2 3
5 1 1
6 2 8
My code is the following:
foreach var in var1 var2 var3 {
gen flag_`var' = 1 if format(`var') == %int
replace flag_`var' = 0 if flag_`var' ==.
I am getting an error message stating
unknown function format()
}
I also tried replacing the parentheses around format(`var') with format[`var'] but then I got an error stating format not found. Is there something wrong with the format I am using or is there a better way to identify non-integer values?
The first answer is what Stata told you: there is no format() function.
But a deeper answer is that thinking of (display) formats is the wrong way round for this question. A display format is in essence an instruction to show data in a certain way and has nothing to do with its stored value, or to be more precise the decimal equivalent of its stored value. Thus 42 displayed with format %4.3f is shown as 42.000 while 6.789 displayed with format %1.0f is shown as 7. Otherwise put, no value has an inherent format, but a display format is used to display a value, either by default or because a user specified a format. Stata is here just using the same broad ideas as say C and various C-like languages.
Nothing to do with its stored value is a slight exaggeration, as only numeric formats make sense for numbers and only string formats make sense for strings, but display format has nothing to do with whether a stored value is integer.
Further %int is not a display format any way. When formats are being checked for, they would be literal strings enclosed in "".
To show non-integers various methods could be used, say using rounding functions such as round(), int(), floor() or ceil(). So an indicator for whether x is integer could be
gen is_int_x = x == floor(x)
All the values in your data example are integer any way, but I take it that you are looking for non-integers elsewhere.

Is there any function in SAS where we can read the exact value from the variable

Suppose i have a column called ABC and that variable has the data like
:
123_112233_66778_1122 or
123_112233_1122_11232 or
1122_112233_66778_123
so i want to generate the desire variable in the next column as 1122. like this "1122" i have a long list where i need to cross the value from the column called ABC, if found the exact match then need to generate. However, i don't want to generate the match like 112233 because it does not match the value what i am looking for.
For an example you can see all three line what i have given for reference. I am taking only the match records which is "1122" from all the above 3 lines.
I really have no clue to overcome on the problem. I have tried my hands with wildcards but did not get much success. Any help would be much apricated
It is hard to tell from your description, but from the values you show it looks like you want the INDEXW() function. That will let you search a string for matching words with a option to specify which characters are to be considered as the separators between the words. The result is the location of where the word starts within longer string. When the word is not found the result is a zero.
Let's create a simple example to demonstrate.
data have;
input abc $30. ;
cards;
123_112233_66778_1122
123_112233_1122_11232
1122_112233_66778_123
;
data want;
set have ;
location = indexw(trim(abc),'1122','_');
run;
Note that SAS will consider any value other than zero (or missing) as TRUE so you can just use the INDEXW() function call in a WHERE statement.
data want;
set have;
where indexw(trim(abc),'1122','_');
run;

Airtable If-statement outputting NaN

I'm using an If-statement to assign integers to strings from another cell. This seems to be working, but if I reference these columns, I'm getting a NaN value. This is my formula below. I tried adding INT() around the output values, but that seemed to break everything. Am I missing something?
IF(FIND('1',{Functional response}),-4,
IF(FIND('2',{Functional response}),-2,
IF(FIND('3',{Functional response}),0,
IF(FIND('4',{Functional response}),2,
IF(FIND('5',{Functional response}),4,"")))))
Assuming Functional response can only store a number 1 to 5 as a string a simple option in excel would be to first convert the string to a number and then use the choose function to assign a value. this works as the numbers are are sequential integers. Assuming Cell K2 has the value of Functional response, your formula could be:
=CHOOSE(--K2,-4,-2,0,2,4)
=CHOOSE(K2+0,-4,-2,0,2,4)
=CHOOSE(K2-0,-4,-2,0,2,4)
=CHOOSE(K2*1,-4,-2,0,2,4)
=CHOOSE(K2/1,-4,-2,0,2,4)
Basically sending the string of a pure number through a math operation has excel convert it to a number. By sending it through a math operation that does not change its value, you get the string as a number.
CHOOSE is like a sequential IF function Supply it with an integer as the first argument and then it will return the value from the subsequent list that matches the number. if the number you supply is greater than the number of options you will get an error.
Alternatively you could just do a straight math convertion on the number stored as a string in K2 using the following formula:
=(K2-3)*2
And as my final option, you could build a table and use VLOOKUP or INDEX/MATCH.
NOTE: If B2:B6 was stored as strings instead of numbers, K2 instead of --K2 would need to be used.

Best way to show blank cell if value if zero

=COUNTIFS(Orders!$T:$T,$B4)
is a code that gives 0 or a +ve result
I use this across 1500 cells which makes the sheet gets filled with 0s
I'd like to remove the Zeros by using the following formula
if(COUNTIFS(Orders!$T:$T,$B3,Orders!$F:$F,""&P$1&"*")=0,
"",
COUNTIFS(Orders!$T:$T,$B3,Orders!$F:$F,""&P$1&"*"))
This calculates every formula twice and increases the calculation time.
How can we do this in 1 formula where if the value is 0 - keep empty - otherwise display the answer
I suggest this cell-function:
=IFERROR(1/(1/COUNTIFS(Orders!$T:$T,$B4)))
EDIT:
I'm not sure what to add as explanation. Basically to replace the result of a complex calculation with blank cells if it results in 0, you can wrap the complex function in
IFERROR(1/(1/ ComplexFunction() ))
It works by twice taking the inverse (1/X) of the result, thus returning the original result in all cases except 0 where a DIV0 error is generated. This error is then caught by IFERROR to result in a blank cell.
The advantage of this method is that it doesn't need to calculate the complex function twice, so can give a significant speed/readability increase, and doesn't fool the output like a custom number format which can be important if this cell is used in further functions.
You only need to set the number format for your range of cells.
Go to the menu Format-->Number-->More Formats-->Custom Number Format...
In the entry area at the top, enter the following: #;-#;""
The "format" of the format string is
(positive value format) ; (negative value format) ; (zero value format)
You can apply colors or commas or anything else. See this link for details
instead of your =COUNTIFS(Orders!$T:$T,$B4) use:
=REGEXREPLACE(""&COUNTIFS(Orders!$T:$T,$B4), "^0$", )
also, to speed up things you should avoid "per row formulae" and use ArrayFormulas

Stata does not replace variable value

Stata does not replace a value, as I am commanding. What is happening?
I have this variable Shutouts, which is a float variable (%9.0g).
One observation has the value = 5.08; that is an error, it should be 5.
I type: replace Shutout= 5 if Shutout==5.08.
And, surprisingly to me, Stata responds:
replace Shutouts=5 if Shutouts==5.08
(0 real changes made)
I have a similar problem for a variable with the same characteristics, with the name Save_perc; one value is 9.2 but should be .92. And, also this time, I receive this response from Stata:
replace Save_perc=.92 if Save_perc==9.2
(0 real changes made)
Why "0 real changes"?
It seems like a very banal problem, but I have been working on it for like 30' and I cannot really figure it out.
it has to do with how floating numbers are stored into memory. You should not use == when comparing two different number formats because some internal storage approximation can make the comparison fail.
In your case, you should just use
Shutouts=5 if Shutouts > 5.07
or
Shutouts=5 if Shutouts == float(5.07)