Stata: how to treat numerical values as categorical values in CEM? - stata

In my data, I have a variable of income level, which is 1,2,3,4,5,6,7, 8, and 9. I am wondering if I directly put this variable in imb or cem without recoding as illustrated in the manual, will the algorithm treat is as numerical value or categorical value?
Or shall I do recoding like this
recode income (1 = 1 "1") (2 = 2 "2") (3 = 3 "3") (4 = 4 "4") (5 = 5 "5") (6 = 6 "6") (7 = 7 "7") (8 = 8 "8") (9 = 9 "9"), gen(income_recode)
I would like to apply exact match on categorical variable for my dataset.

Related

Setting max and min values for creating a random number [duplicate]

This question already has answers here:
Generating a random integer from a range
(14 answers)
Closed 3 years ago.
I am trying to generate random numbers between 2 and 11, here is my code:
srand(time(NULL));
int card;
card = rand() % 11 + 2;
However, currently my code is creating numbers from 2-12. How could I solve this so that it creates numbers from 2-11?
range % 11 has 11 possible vales (0 to 10), but you want 10 possible values (2 to 11), so you first change your mod to % 10. Next, since the values returned by rand() % 10 start at 0, and you want to start at 2, add 2. So:
card = rand() % 10 + 2;

Getting MAX() of each row with an ARRAYFORMULA() [duplicate]

This question already has an answer here:
How to use arrayformula with formulas that do not seem to support arrayformulas?
(1 answer)
Closed 4 months ago.
Using an array formula I want to find the max value of each row of a range and get the resulting range to work with it further.
The problem occurs as soon as I add the MAX() statement since it does seem to behave strangely within an array formula. Even if you ad commands which will give you multiple values within the MAX() statement it does always only return one single value.
E.g. this will give you the ranges which I want to get the max of:
=ARRAYFORMULA(ADDRESS(ROW(E1:E11); COLUMN() + 1; 4) & ":" & ADDRESS(ROW(E1:E11); COLUMN() + 4; 4))
The result looks like the following:
F1:I1
F2:I2
F3:I3
F4:I4
F5:I5
F6:I6
F7:I7
F8:I8
F9:I9
F10:I10
F11:I11
If I now add INDIRECT() to make those to actual ranges and add MAX() it should return the max of each of those ranges since the array formula should go through the ROW(E1:11) as it did bevor. However, the result of this new formula
=ARRAYFORMULA(MAX(INDIRECT(ADDRESS(ROW(E1:E11); COLUMN() + 1; 4) & ":" & ADDRESS(ROW(E1:E11); COLUMN() + 4; 4))))
rather is one single value, the maximum of the first range.
I have even tried to bypass the problem by adding an IF() statement for the array formula to iterate through the rows. Doing so, it did give me a result for all 11 rows, however, the result always was the same (the max of the first row).
The new formula:
=ARRAYFORMULA(IF(ROW(E1:E11) = ROW(E1:E11); MAX(INDIRECT(ADDRESS(ROW(E1:E11); COLUMN() + 1; 4) & ":" & ADDRESS(ROW(E1:E11); COLUMN() + 4; 4))); ""))
The new result (left column are the results of the formula, trying to get the max of each row to its right):
10 7 10 4 1
10 10 8 1 2
10 4 5 9 4
10 10 10 2 2
10 10 10 5 10
10 10 6 9 5
10 4 5 7 3
10 6 9 4 7
10 5 5 7 3
10 9 2 3 10
10 10 3 9 10
=QUERY(TRANSPOSE(QUERY(TRANSPOSE(F1:I),
"select "&REGEXREPLACE(JOIN( , ARRAYFORMULA(IF(LEN(F1:F&G1:G&H1:H&I1:I),
"max(Col"&ROW(F1:F)-ROW(F1)+1&"),", ""))), ".\z", "")&"")),
"select Col2")

Python math function does not find all answers in range given [duplicate]

This question already has answers here:
Is floating point math broken?
(31 answers)
Closed 5 years ago.
I am quite new to learning python and was playing around with the math functions. I tried to create a function that would allow you to find certain powers e.g. squares, cubes.
Why is it when I run the code below it lists most of the powers required but manages to miss some.
def more_powers():
print "For which power do you wish to find: "
power = int(raw_input("> "))
print "Choose the upperbound: "
n = int(raw_input("> "))
for num in range(2, n):
for base in range(2, num):
if log(num, base) / power == 1:
print "%d is a power of %d." % (num, base)
else:
base += 1
For which power do you wish to find:
> 3
Choose the upperbound:
> 5000
8 is a power of 2.
27 is a power of 3.
64 is a power of 4.
343 is a power of 7.
512 is a power of 8.
729 is a power of 9.
1331 is a power of 11.
1728 is a power of 12.
2197 is a power of 13.
2744 is a power of 14.
3375 is a power of 15.
4096 is a power of 16.
As you can see it misses out the equivalent power for 5, 6, 10 and 17.
Hint:
>>> log(125, 5) / 3 == 1
False
>>> log(125, 5)
3.0000000000000004
>>> log(216, 6) / 3 == 1
False
>>> log(216, 6)
3.0000000000000004

Explain this bit of code to a beginner [duplicate]

This question already has answers here:
What is the result of % in Python?
(20 answers)
Closed 6 years ago.
for x in xrange(12):
if x % 2 == 1:
continue
print x
i know what it does, but the language doesn't make sense to me. In particular the second line is where i am lost.
if x % 2 == 1 means "if x modulo 2 equals 1".
Modulo (or mod) is the remainder after division. So, for example:
3 mod 2 = 1
12 mod 5 = 2
15 mod 6 = 3
For x mod 2, you're there's a remainder if and only iff x is odd. (Because all even numbers are divisible by two with 0 remainder.) Likewise, odd numbers will always have a remainder of 1.
So x % 2 == 1 returns true if x is odd.

Variable that double counts the observations

I am trying to create a new variable such that it would count like
1,1,2,2,3,3,4,4 ..... meaning it would double count the observations.
My current code is like this
gen newid = _n
replace newid = newid[_n+1] if mod(newid2,2) == 0
but with this the result comes out as 1,1,3,3,5,5,7,7, ... where the increments are in 2's, i.e. I only get odd numbers. How should I modify this code?
You might try dividing your ID variable by 2, and then use Stata's ceil function to force it up to the nearest integer.
clear
set obs 50
gen newid = _n
gen newid2 = ceil(newid/2)
You can use the int(x) function.
This function returns the integer obtained by truncating x.
Thus, int(5.2) is 5.
If you want the following pattern
1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9
the command is
gen seq = int((_n-1)/2) +1