Stata: If statement with variable names - stata

I want to create an if statement in Stata based on the variable names. However, I couldn't find a way to create a local for variable names i.e. `variable_name_x'
foreach x of varlist dummyind_* {
hist x if `variable_name_x'==ind
}
clear
input float dummyind_1 float dummyind_2 str10 ind
0.1 0.7 dummyind_1
0.1 0.5 dummyind_2
0.2 0.8 dummyind_1
0.3 0.3 dummyind_2
0.4 0.2 dummyind_1
end

This is a fairly puzzling data structure, but I think I understand the question and guess you need something more like
foreach x of var dummy_ind* {
hist `x' if ind == "`x'"
}
That is, each variable name in question has two roles: (1) as a variable name (2) as a value of another string variable.
The above is using an if qualifier, not an if statement, which is different.

Related

Python: Returning Floats Incorrectly

I have a Python function as follows:
def point_double((xp,yp)):
s = (3.0 * pow(xp,2) - 1)/(2.0*yp) # s = 2.6
xr = pow(s,2) - (2 * xp) # xr = 0.76
yr = s * (xp - xr) - yp # yr = 0.824
return (xr,yr)
When I call point_double((3,5)) I get a return value of (0.7600000000000007, 0.8239999999999981) rather than the correct value of (0.76,0.824)
Adding print xr, yr just before the return line prints the desired result, but changing that to print (xr,yr) prints the incorrect value
Can someone explain to me why this happens, and also help me overcome this so that the function returns the desired value
Ok, its two things - firstly its that computers can't represent some numbers very well (like 1/3rd), and secondly its how "print" is manipulating your code:
So:
xr, yr = point_double((3,5))
print xr
print yr
print (xr,yr)
that will produce:
0.76
0.824
(0.7600000000000007, 0.8239999999999981)
You should consider instead printing a bit like this:
print (" x: %s y: %s" % (xr, yr))
I'm struggling to find the exact reference in print as to how it calls str() on the print, because in 2.7 its an inbuilt. Effectively by using % or string.format, you are enforcing and controlling how the float is being rendered.
zero-piraeus noted:
When you print an object, its str() method is called. The
str() method for a tuple (or any other inbuilt collection) calls repr() for each item in the collection.
Note that the actual value of xr is the longer, inaccurate one.
Edit: This is a good guide (for python 3) about how floating points are stored and other ways of printing them https://docs.python.org/3/tutorial/floatingpoint.html

Printf'ing floating point numbers in C++ gives zeroes

I am doing some c++ right now and stumbled on a problem I can't get to wrap my head around.
I am doing a floating point comparison like this:
if(dists.at<float>(i,0) <= 0.80 * dists.at<float>(i,1)) {
matched = true;
matches++;
} else {
printf((dists.at<float>(i,0) <= 0.80 * dists.at<float>(i,1)) ? "true\n" : "false");
printf("threw one away because of dist: %f/%f\n", dists.at<float>(i,0),dists.at<float>(i,1));
}
at the first line, the comparison threw a false what means: dists[0] > dists[1]
When we print the values, the results are this:
falsethrew one away because of dist: 0.000000/0.000000
falsethrew one away because of dist: 0.000000/0.000000
I think it has something to do with the results not being a float or something, but I'm no pro at C++ so I could use some help to find out what these values are.
Looks like it was due to the number being too small.
In response to #NirMH, I am posting my comment as an answer.
%g instead of %f can print very small floating point numbers.
There are other options such as %e, as #ilent2 mentioned in the comment.
For example:
double num = 1.0e-23;
printf("%e, %f, %g", num, num, num);
prints out the following result:
1.000000e-023, 0.000000, 1e-023

check if input is within specified ranges in c++

I'm asking for inputs and I want to have outputs depending on the range within the inputs fall.
example:
I accept inputs like 0.3 0.55 etc.
Range is 0.0 to 1.
The "step" is 0.1. Meaning there are 10 positions/checkpoints.
If the input is 0.3, since it is three times the "step" it should return "position 3", if it is smaller than 0.3 but larger than 0.2 it should return "between positions 2 and 3" etc.
question:
Can this be done without explicit if-statements, or switch cases, for all possible positions??
It's easy to write such a function, based on the value of (input-range_min)/(range_max-range_min)*10.
struct Position
{
int positionLow;
bool inBetween;
};
Position WhereInRange(float input, float minScale, float maxScale, int numPositions)
{
Position res;
float fPlace = (input-minScale)/(maxScale-minScale)*numPositions;
res.positionLow = int(floor(fPlace));
res.inBetween = res.positionLow != fPlace;
}

Calculating the value of arctan(x) in C++

I have to calculate the value of arctan(x) . I have calculated the value of this by evaluating the following series :
Arctan (x) = x – x^3/3 + x^5/5 – x^7/7 + x^9/9 - …
But the following code can not calculate the actual value. For example, calculate_angle(1) returns 38.34 . Why?
const double DEGREES_PER_RADIAN = 57.296;
double calculate_angle(double x)
{
int power=5,count=3;
double term,result,prev_res;
prev_res = x;
result= x-pow(x,3)/3;
while(abs(result-prev_res)<1e-10)
{
term = pow(x,power)/power;
if(count%2==0)
term = term*(-1);
prev_res=result;
result=result+term;
++count;
power+=2;
// if(count=99)
// break;
}
return result*DEGREES_PER_RADIAN;
}
EDIT: I found the culprit. You forgot to include stdlib.h, where the function abs resides. You must have ignored the warning about abs being implicitly declared. I checked that removing the include yields the result 38.19 and including it yields the result ~45.
The compiler is not required to stop compilation when an undeclared function is being used (in this case abs). Instead, it is allowed to make assumptions on how the function is declared (in this case, wrong one.)
Besides, like other posters already stated, your use of abs is inappropriate as it returns an int, not a double or float. The condition in the while should be >1e-100 not <1e-100. The 1e-100 is also too small.
--
You forgot to increase count and power after calculating the first two summands:
prev_res = x;
result= x-pow(x,3)/3;
count = 4; <<<<<<
power = 5; <<<<<<
while(abs(result-prev_res)<1e-100)
{
term = pow(x,power)/power;
if(count%2==1)
term = term*(-1);
Also I consider your use of the count variable counterintuitive: it is intialized with 3 like if it denotes the last used power; but then, loop iterations increase it by 1 instead of 2 and you decide the sign by count%2 == 1 as opposed to power%4 == 3
The series converges to tan^{-1} x, but not very fast. Consider the series when x=1:
1 - 1/3 + 1/5 - 1/7 + 1/9 - ...
What is the error when truncating at the 1/9 term? It's around 1/9. To get 10^{-100} accuracy, you would need to have 10^{100} terms. The universe would end before you'd get that. And, catastrophic round-off error and truncation error would make the answer utterly unreliable. You only have 14 digits to play with for doubles.
Look at reference works like Abramowitz and Stegun [AMS 55] or the new NIST Digital Library of Mathematical Functions at http://dlmf.nist.gov to see how these are done in practice. Often, one uses Padé approximants instead of Taylor series. Even when you stick with Taylor series, you often use Chebyshev approximation to cut down on the total error.
I also recommend Numerical Methods that [Usually] Work, by Forman Acton. Or the Numerical Recipes in ... series.
Your sign is the wrong way around after the first two terms. It should be:
if(count%2==0)
term = term*(-1);
Your comparison is the wrong way around in the while condition. Also, you're expecting an unrealistically high level of precision. I would suggest something more like this:
while(fabs(result-prev_res)>1e-8)
Finally, you'll get a more accurate result with a better value for DEGREES_PER_RADIAN. Why not something like this:
const double DEGREES_PER_RADIAN = 180/M_PI;

What's the difference between gen and egen in Stata 12?

Is there a reason why there are two different commands to generate a new variable?
Is there a simple way to remember when to use gen and when to use egen?
They both create a new variable, but work with different sets of functions. You will typically use gen when you have simple transformations of other variables in your dataset like
gen newvar = oldvar1^2 * oldvar2
In my workflow, egen usually appears when I need functions that work across all observations, like in
egen max_var = max(var)
or more complex instructions
egen newvar = rowmax(oldvar1 oldvar2)
to calculate the maximum for each observation between oldvar1 and oldvar2. I don't think there is a clear logic for separating the two commands.
gen
generate may be abbreviated by gen or even g and can be used with the following mathematical operators and functions:
+ addition
- subtraction
* multiplication
/ division
^ power
A large number of functions is available. Here are some examples:
abs(x) absolute value of x
exp(x) antilog of x
int(x) or trunc(x) truncation to integer value
ln(x), log(x) natural logarithm of x
round(x) rounds to the nearest integer of x
round(x,y) x rounded in units of y (i.e., round(x,.1) rounds to one decimal place)
sqrt(x)square root of x
runiform() returns uniformly distributed numbers between 0 and nearly 1
rnormal() returns numbers that follow a standard normal distribution
rnormal(x,y) returns numbers that follow a normal distribution with a mean of x and a s.d. of y
egen
A number of more complex possibilities have been implemented in the egen command like in the following examples:
egen nkids = anycount(pers1 pers2 pers3 pers4 pers5), value(1)
egen v323r = rank(v323)
egen myindex = rowmean(var15 var17 var18 var20 var23)
egen nmiss = rowmiss(x1-x10 var15-var23)
egen nmiss = rowtotal(x1-x10 var15-var23)
egen incomst = std(income)
bysort v3: egen mincome = mean(income)
Detailed usage explanations can be found at this link.