How to plot the different graphs by stcurve in one chart in Stata? - stata

I am using stcurve in Stata to plot survival probability. I need to plot the graph for all data and then for specific variables. I can generate the graphs in two different charts, but I need to have all three lines together in one chart.
I have tried the addplot() option but I get the error that stcurve is not a twoway graph. Do you have any idea how to do this?
This is the code that I have used which generates the graphs in two different charts separately:
stcurve, survival graphregion(lcolor(white) ilcolor(white) ifcolor(white) ) plotregion( lcolor(black)) title("Survival Function", size(vlarge)) ytitle("Survival probabilities", size(large)) xtitle("Time", size(large)) xlabel(,labsize(medium)) ylabel(,labsize(medium))
stcurve, survival at1( def=0) at2( def=1) graphregion(lcolor(white) ilcolor(white) ifcolor(white) ) plotregion( lcolor(black)) legend(label(1 "X Firms") label(2 "Y Firms")) legend(size(large)) lwidth(thin thick) title("Survival Function", size(vlarge)) ytitle("Survival probabilities", size(large)) xtitle("Time", size(large)) xlabel(,labsize(medium)) ylabel(,labsize(medium))

I am not sure if I understood correctly what you want. It would have been useful if you had added the stset and stcox code necessary before running stcurve.
If the Kaplan-Meier hazard graph is identical to your first stcurve, survival you can try a dirty fix by generating a variable e.g.
sts gen s2=s after running stset
then plotting it as a line against your time variable. i.e. adding this to the end of the second graph:
addplot(line s2 your_timevar, sort c(J) title("Survival probabilities"))
The equality of KM hazard and Cox hazard only holds if the first graph does not have any more predictors than failvar in the stset. So if you ran stcox, estimate after stset timevar, failure(failvar) id(idvar) it works, but if you have more variables in the stcox call this will not give you the correct plot.
edit:
As the above quick solution does not work, there is another dirty workaround: save the results from stcurve in a file (option outfile), then plot the "new" data as twoway graphs. Something like this:
stcurve, survival name("surv1") outfile(stcurve1.dta, replace)
stcurve, survival name("surv2") at1( def=0) at2( def=1) outfile(stcure2.dta, replace)
use stcurve1.dta, clear
rename surv1 surv1_A
rename _t _tA
append using stcurve2.dta
twoway line surv1 _t, sort || line surv1_A _tA, sort
I do not know if this will work with your data: it may be that you need to manipulate the new variables in the outfiles in some way to get the desired results, and you need to add the options you want to the twoway graphs. There surely are many better and easier ways of plotting this when you have the data for the graphs in separate datafiles, but this is the first solution that sprang to mind.

Related

How to get y axis range in Stata

Suppose I am using some twoway graph command in Stata. Without any action on my part Stata will choose some reasonable values for the ranges of both y and x axes, based both upon the minimum and maximum y and x values in my data, but also upon some algorithm that decides when it would be prettier for the range to extend instead to a number like '0' instead of '0.0139'. Wonderful! Great.
Now suppose that after (or while) I draw my graph, I want to slap some very important text onto it, and I want to be choosy about precisely where the text appears. Having the minimum and maximum values of the displayed axes would be useful: how can I get these min and max numbers? (Either before or while calling the graph command.)
NB: I am not asking how to set the y or x axis ranges.
Since this issue has been a bit of a headache for me for quite some time and I believe there is no good solution out there yet I wanted to write up two ways in which I was able to solve a similar problem to the one described in the post. Specifically, I was able to solve the issue of gray shading for part of the graph using these.
Define a global macro in the code generating the axis labels This is the less elegant way to do it but it works well. Locate the tickset_g.class file in your ado path. The graph twoway command uses this to draw the axes of any graph. There, I defined a global macro in the draw program that takes the value of the omin and omax locals after they have been set to the minimum between the axis range and data range (the command that does this is local omin = min(.scale.min,omin) and analogously for the max), since the latter sometimes exceeds the former. You could also define the global further up in that code block to only get the axis extent. You can then access the axis range using the globals after the graph command (and use something like addplot to add to the previously drawn graph). Two caveats for this approach: using global macros is, as far as I understand, bad practice and can be dangerous. I used names I was sure wouldn't be included in any program with the prefix userwritten. Also, you may not have administrator privileges that allow you to alter this file based on your organization's decisions. However, it is the simpler way. If you prefer a more elegant approach along the lines of what Nick Cox suggested, then you can:
Use the undocumented gdi natscale command to define your own axis labels The gdi commands are the internal commands that are used to generate what you see as graph output (cf. https://www.stata.com/meeting/dcconf09/dc09_radyakin.pdf). The tickset_g.class uses the gdi natscale command to generate the nice numbers of the axes. Basic documentation is available with help _natscale, basically you enter the minimum and maximum, e.g. from a summarize return, and a suggested number of steps and the command returns a min, max, and delta to be used in the x|ylabel option (several possible ways, all rather straightforward once you have those numbers so I won't spell them out for brevity). You'd have to adjust this approach in case you use some scale transformation.
Hope this helps!
I like Nick's suggestion, but if you're really determined, it seems that you can find these values by inspecting the output after you set trace on. Here's some inefficient code that seems to do exactly what you want. Three notes:
when I import the log file I get this message:
Note: Unmatched quote while processing row XXXX; this can be due to a formatting problem in the file or because a quoted data element spans multiple lines. You should carefully inspect your data after importing. Consider using option bindquote(strict) if quoted data spans multiple lines or option bindquote(nobind) if quotes are not used for binding data.
Sometimes the data fall outside of the min and max range values that are chosen for the graph's axis labels (but you can easily test for this).
The log linesize is actually important to my code below because the key values must fall on the same line as the strings that I use to identify the helpful rows.
* start a log (critical step for my solution)
cap log close _all
set linesize 255
log using "log", replace text
* make up some data:
clear
set obs 3
gen xvar = rnormal(0,10)
gen yvar = rnormal(0,.01)
* turn trace on, run the -twoway- call, and then turn trace off
set trace on
twoway scatter yvar xvar
set trace off
cap log close _all
* now read the log file in and find the desired info
import delimited "log.log", clear
egen my_string = concat(v*)
keep if regexm(my_string,"forvalues yf") | regexm(my_string,"forvalues xf")
drop if regexm(my_string,"delta")
split my_string, parse("=") gen(new)
gen axis = "vertical" if regexm(my_string,"yf")
replace axis = "horizontal" if regexm(my_string,"xf")
keep axis new*
duplicates drop
loc my_regex = "(.*[0-9]+)\((.*[0-9]+)\)(.*[0-9]+)"
gen min = regexs(1) if regexm(new3,"`my_regex'")
gen delta = regexs(2) if regexm(new3,"`my_regex'")
gen max_temp= regexs(3) if regexm(new3,"`my_regex'")
destring min max delta , replace
gen max = min + delta* int((max_temp-min)/delta)
*here is the info you want:
list axis min delta max

Pie charts in Stata

I'm using the code below to draw some graphs and combine them. When I execute the entire file I get the error:
"Invalid Syntax r(198)".
And the code stops at the code segment below. However, when I run the code segment separately the program works without a flaw. Can you please help me understand what's causing this issue?
*pie chart
foreach i in "SPA" "EPD"{
graph pie billed_amount if type== "`i'", over(service_id) saving(gg`i',replace)
local gg `gg' "gg`i'"
}
local gg: subinstr local gg "ggSPA" `""ggSPA""'
gr combine `gg'
graph export "C\provider.png", as(png) replace
graph drop _all
Without any context -- whether the code before this that makes a difference -- or a dataset to use -- how can we tell? The problem lacks a minimal complete verifiable example. See https://stackoverflow.com/help/mcve for this and future questions.
That said, this seems to be a very roundabout way to get two pie charts side-by-side. That doesn't require a loop and it doesn't require graph combine.
graph pie billed_amount if inlist(type, "SPA", "EPD"), over(service_id) by(type)
graph export "C\provider.png", as(png) replace
Whether you want to drop all graphs afterwards is quite immaterial to the problem posed.

Sort by variable in twoway scatter. X-axis stays alphabetical and sort produces gibberish: why?

I have two variables:
ie_ctotal
cntry2
Note: cntry2 is an encoded version of a string variable cntry: I don't know if this may be affecting things.
I want a twoway scatter of ie_ctotal and cntry2, and I want to SORT this scatter by another variable gdppc,
twoway || scatter ie_ctotal cntry2, c(1) xlabel(,valuelabel)
The above without sort works fine. Once I introduce sort, however,
twoway || scatter ie_ctotal cntry2, c(1) sort(gdppc) xlabel(,valuelabel)
The graph turns gibberish, or rather it connects according to the sort, but the x axis remains alphabetical, making the connections seem scribbled.
Any ideas as to what I am doing wrong?
Note: I don't want to sort the original data, because I was advised in previous questions that this is a bad idea. So I want to sort the data only for this one graph.
There is no reproducible example here, and not even a graph, but it is possible to guess the problem.
You are typing above
c(1)
which is ill-advised, although Stata does the right thing. It would be better to type
c(l)
which instructs Stata to join data points on your graph in a line. (Nod to #Dimitriy V. Masterov on this detail.)
In your first example, the values of cntry2 define the x axis.
As you say, the effect of sort(gdppc) is to connect points in order of their values from lowest gdppc to highest. The result is clearly not what you want.
Here is a dopey reproducible example that makes the point.
. sysuse auto, clear
(1978 Automobile Data)
. scatter mpg weight, sort(price) c(l)
You want to sort the countries into gdppc order. This is like sorting make in Stata's auto data according to mpg, but then plotting weight. Here I do this just for foreign cars. It's not a very good graph, but it sounds close in spirit to what you want. This solution requires installation of labmask, for which search labmask and then download from the Stata Journal website.
sysuse auto, clear
keep if foreign
sort mpg
gen obsno = _n
labmask obsno, values(make)
scatter weight obsno, xla(1/22, valuelabel ang(v) noticks) xtitle("")
In a nutshell: the sort() option here defines a connection order; it doesn't map the x axis variable to a reshuffled version. That you need to do before the graphics.
UPDATE In fact, you can get essentially the same graph without any prior manipulation:
graph dot (asis) weight if foreign, over(make, sort(mpg) label(ang(v))) vertical linetype(line) lines(lc(none))
This is going along with the OP's interest in putting labelled categories on the x axis. A graph easier to read would put them on the y axis: then text can be read left to right. To get that, omit the vertical above: that is the default for graph dot. Although the command above omits guide lines by setting their colour to none, very thin light colour guide lines can help.
This uses the trick of encoding using the order of another variable to get the sorting right:
sysuse auto, clear
keep if foreign==1
sencode make, gen(encoded_make) gsort(-weight)
levelsof encoded_make, local(labels)
tw scatter price encoded_make, mlabel(weight) c(1) xlabel(`labels', value angle(45)) sort(weight)
You will need to install sencode from SSC.

SAS gmap and label centering

I have a Problem where I could not find an easy solution and I am looking for some ideas or tipps.
I am working with SAS on a project which result should be a map of europe, where the countries get colored after a certain algorithm. I use the maps.europe data and the %annomac and %maplabel macros to label the countries.
This works pretty fine, except for Portugal and Spain - because theese countries have island far away from the coastside, the calculated centroid from %maplabel of the country is not in the center of the country:
Unfortunately I just can cut portugal completely out of the map but not the Islands
I have tried already this method:
Try to cut the parts of the map via gproject which contains the islands - this delievered unexplaniable results to me (just showing some parts of europe, even if I set the parameters extremly wide)
and now I am a bit stuck.
I already thought about this ideas:
Comnbining the map.europe with the map.spain and and map.portugal where I delete the islands before, but I am not sure how to do that that the labeling and all still works for theese combined data.
Is it possible to set the label points for portugal and spain manually and overwrite the data from the %maplabel macro?
Or is there an even easier solution?
Thanks for your help and best regards
stephan
I'm not familiar with those macros, but given how GMAP works, I would indeed override the annotate dataset. You may want to read up on how annotate datasets work, but in general:
The GMAP statement will have an option, annotate= and some dataset. Find that dataset, let's say it's called ANNODS.
Then look at that dataset. Identify a row that has function=text and label=PORTUGAL. That is the row you need to modify the x/y coordinates of in order to move the label around (x1 and y1). You might need to play around with this some to get the right coordinates.
Then run the PROC GMAP, and you should have a newly moved-over Portugal.

Stata. How to transform a dataset into pure panel data?

I have done this many times with Excel and Java... This time I need to do it using Stata because it is more convenient to preserve variables' labels. How can I restructure dataset_1 into dataset_2 below?
I need to transform the following dataset_1:
into dataset_2:
I know one way, which is a little awkward... I mean, I could expand all the observations, then create variable obsNo, and then, rename variables...is there any better way?
Stata is wonderful at this sort of thing, it's a simple reshape. Your data is a little awkward, as the reshape command was designed to work with variables where the common part of the variable name (in your case, Wage) comes first. In the documentation for reshape, "Wage" would be the stub. The part following Wage is required to be numeric. If you first sort your variable names by
rename (raceWhiteWage raceBlackWage raceAsianWage) (Wage1 Wage2 Wage3)
Then you can do:
reshape long Wage, i(state year) j(race)
That should give you the output your are looking for. You will have a column labeled "race", with values of 1 for White, 2 for Black, and 3 for Asian.