I am trying to report descriptive stats in the paper.
As I am dealing with panel data, I need to report the mean and SD according to the year.
So I want years to be placed in the first row and the variables are in the first column.
bys wave: asdoc variable1 variable2, replace stat(mean sd)
However, this code produces the opposite. How can I deal with this problem?
(I am using Stata 14)
Try to interchange position of variable1 amd variable2
asdoc variable2 variable1, replace stat(mean sd)
Related
Is it possible to create a backwards counting variable in Stata (like the command _n, just numbering observations backwards)? Or a command to flip the data set, so that the observation with the most recent date is the first one? I would like to make a scatter plot with AfD on the y-axis and the date (row_id) on the x-axis. When I make the plot however, the weeks are ordered backwards. How can I change the order?
This is the code:
generate row_id=_n
twoway scatter AfD row_id || lfit AfD row_id
Here are the data set and the plot:
Your date variable is a string variable, which is unlikely to get you the desired result if you sort on that variable.
You can create a Stata internal form date variable from your string variable:
gen date_num = daily(date, "MDY")
format date_num %td
The values of this new variable will represent the number of days since 1 Jan 1960.
If you create a scatter plot with this date variable on the x-axis, by default it will be sorted from min to max. To let it run from max to min you can specify option xscale(reverse).
If you still want to create an id variable by yourself you can choose one of these options (ascending and descending):
sort date_num
gen id = _n
gsort -date_num
gen id = _n
For your problem, plotting in terms of a daily date variable and -- if for some reason that is a good idea -- using xscale(reverse) are likely to be what you need, as well explained by #Wouter.
In general something like
gen long newid = _N - _n + 1
sort newid
will reverse a dataset.
In my Google spreadsheet document, I have a table of bills with a column J for dates and a column P for amounts. I would like to build another table that sums these bills by year (one row = one year). The P column has contents like "3245,20 EUR".
Here is the formula I tried (in this example, S5 should be the sum and R5 a numeric value of a specific year) :
S5=SUMIF(YEAR($J$5:$J), R5, VALUE(REGEXEXTRACT($P$5:$P, "^([0-9]+,[0-9]{2}) EUR$")))
This doesn't work. Any solution ? Thank you.
You can use the following formula
=SUM(QUERY({A2:A,ArrayFormula(VALUE(REGEXEXTRACT(SUBSTITUTE(B2:B,",","."), "^([0-9]+\.[0-9]{2}) EUR$")))},
"select Col2 where year(Col1)="&A1&""))
The reason we have to use SUBSTITUTE is because it looks like the price values come from a different locale.
(Please adjust ranges to your needs)
Functions used:
QUERY
ArrayFormula
VALUE
REGEXEXTRACT
SUBSTITUTE
SUM
I have a set of data with observations (Joe, Dana, Mark,...) and their respective ratings for a movie ( Batman - 3 Stars, Deadpool - 4 Stars). When I use the proc Corr in SAS only give the correlation between movie and not observations.
How do I find the correlation between the observations in SAS?
I think you should use SPEARMAN option to correlate qualitative data and specify variables to correlate by VAR.
PROC CORR DATA=marks SPEARMAN;
VAR names films ;
RUN;
What have you tried before?
I have a dataset that looks like :
Table 1
I want to collapse the data in Stata such that the data appears as :
Table 2
I am aware that if Product were a numeric variable we could use the collapse command. However, I don't know what to do in this situation since Product is a string variable.
Lacking a reproducible example, I'll supply an untested answer. Something like the following might set you in the right direction.
bysort year type (product): generate str1000 products = trim(product[1])
bysort year type (product): replace products = product[_n-1]+","+trim(product) if _n>1
bysort year type (product): generate totsales = sum(sales)
bysort year type (product): keep if _n==_N
drop product sales
Using the auto.dta data, I want to top-code the PRICE variable at the median of top X%. For example, X% could be 3%, 4%, etc. How can I do this in Stata?
In answering your question, I am assuming that you want to replace all the values above, say top 10%, with value say X(top 90% in the following code).
Here is the sample code:
program topcode
sysuse auto, clear
pctile pct = price, nq(10)
dis r(r9)
gen newprice=price
replace newprice=r(r9) if newprice>r(r9)
end