Stata bar charts with dates - stata

I'm trying to create a bar chart with a time axis. This is the code that I'm using:
twoway (bar weeksum week)
The week variable is a time variable and has the format %td.
However, when I create the bar chart, the X axis does not follow the format specified for the week variable. Instead the X axis takes integer values. I was wondering whether there is a way to fix this.

I can't reproduce this in Stata 13.1. The format of the time variable is used by default on the time (horizontal) axis.
Here's a test script. Hint: You should learn to produce such reproducible examples yourself.
clear
input week weeksum
20093 16
20100 61
20107 34
20114 42
20121 24
end
format week %td
levelsof week, local(levels)
twoway bar weeksum week, base(0) barw(6) xla(`levels', noticks)

Related

IF statement using cell formatted value

In Google Sheets, how can you use the IF statement using the output of a cell's formatted value?
I've created a Google Sheet with a date value in B1. The cell is formatted using the date format. B1 is then copied across to the right so that it increments the days of the month.
In B2, based on the B1 date, I want to display the day value (ie Thu, Fri, Sat etc.). Similarly to the first step, I set the date format under Format > Number > More formats > More date and time formats, then from the select box, choose the day and I want to display the "Tue" date format.
Then in B8, I want to display an amount of 30 every Thursday, otherwise, show 0.
This is where it doesn't work. Does it seem like the rendered cell's value can't be used in an IF statement? My desired output is that every the value in row 2 is a 'Thu' display 30.
As a side note, I jumped on Google support and asked this same question. Even though they are technical support, I thought I'd give them a try anyway, after all, I am paying for GSuite.
This is the formula they came back to me with: =if(B2=B1, "30", "0").
Of course, this formula will work in B8, because B2 is equal to B1 in the actual cell value, it doesn't take into account the formatted cell value. This formula will fail the output we need as it will always display 30 instead of only then row 2 is 'Thu'.
So essentially, is there a function in Google Sheets for a rendered cell value? Or another solution around this?
Note: I do NOT want to use any scripting to get this to work.
try it like this:
=IF(TEXT(B2, "ddd")="Thu", 30, 0)
Please try and copy across to suit:
=30*(WEEKDAY(B1)=5)
No IF, but shorter.

Moving average using forvalues - Stata

I am struggling with a question in Cameron and Trivedi's "Microeconometrics using Stata". The question concerns a cross-sectional dataset with two key variables, log of annual earnings (lnearns) and annual hours worked (hours).
I am struggling with part 2 of the question, but I'll type the whole thing for context.
A moving average of y after data are sorted by x is a simple case of nonparametric regression of y on x.
Sort the data by hours.
Create a centered 15-period moving average of lnearns with ith observation yma_i = 1/25(sum from j=-12 to j=12 of y_i+j). This is easiest using the command forvalues.
Plot this moving average against hours using the twoway connected graph command.
I'm unsure what command(s) to use for a moving average of cross-sectional data. Nor do I really understand what a moving average over one-period data shows.
Any help would be great and please say if more information is needed.
Thanks!
Edit1:
Should be able to download the dataset from here https://www.dropbox.com/s/5d8qg5i8xdozv3j/mus02psid92m.dta?dl=0. It is a small extract from the 1992 Individual-level data from the Panel Study of Income Dynamics - used in the textbook.
Still getting used to the syntax, but here is my attempt at it
sort hours
gen yma=0
1. forvalues i = 1/4290 {
2. quietly replace yma = yma + (1/25)(lnearns[`i'-12] to lnearns[`i'+12])
3. }
There are other ways to do this, but I created a variable for each lag and lead, then take the sum of all of these variables and the original then divide by 25 as in the equation you provided:
sort hours
// generate variables for the 12 leads and lags
forvalues i = 1/12 {
gen lnearns_plus`i' = lnearns[_n+`i']
gen lnearns_minus`i' = lnearns[_n-`i']
}
// get the sum of the lnearns variables
egen yma = rowtotal(lnearns_* lnearns)
// get the number of nonmissing lnearns variables
egen count = rownonmiss(lnearns_* lnearns)
// get the average
replace yma = yma/count
// clean up
drop lnearns_* count
This gives you the variable you are looking for (the moving average) and also does not simply divide by 25 because you have many missing observations.
As to your question of what this shows, my interpretation is that it will show the local average for each hours variable. If you graph lnearn on the y and hours on the x, you get something that looks crazy becasue there is a lot of variation, but if you plot the moving average it is much more clear what the trend is.
In fact this dataset can be read into a suitable directory by
net from http://www.stata-press.com/data/musr
net install musr
net get musr
u mus02psid92m, clear
This smoothing method is problematic in that sort hours doesn't have a unique result in terms of values of the response being smoothed. But an implementation with similar spirit is possible with rangestat (SSC).
sort hours
gen counter = _n
rangestat (mean) mean=lnearns (count) n=lnearns, interval(counter -12 12)
There are many other ways to smooth. One is
gen binhours = round(hours, 50)
egen binmean = mean(lnearns), by(binhours)
scatter lnearns hours, ms(Oh) mc(gs8) || scatter binmean binhours , ms(+) mc(red)
Even better would be to use lpoly.

SAS Graph - Values of a variable for axis1 tick marks

In SAS graph, I have a graph that expresses axis2 with dates(Batch_date) from 6/24/2015 to 12/8/2015 in 6 day increments. If I use the statement -
proc gplot data=datasets.x_batch;
plot
prcnt_10*batch_date=1
prcnt_20*batch_date=2
prcnt_30*batch_date=3
prcnt_40*batch_date=4
prcnt_50*batch_date=5
prcnt_60*batch_date=6
prcnt_70*batch_date=7
prcnt_80*batch_date=8
prcnt_90*batch_date=9
prcnt_100*batch_date=10
prcnt_110*batch_date=11
prcnt_120*batch_date=12....
It creates my plot successfully, but the tick marks seem to be notated with default values (6/16,7/1...etc). I understand how to control the tick marks with the minor=(n=n h=n) and major=(h=n) but it still chooses different values to display on the major tick marks.
Is there a way to explicitly assign each tick mark label (or better yet, just call out the date variable values for the tick mark labels) with out explicitely stating each date value in an (order) statement?

Plotting line plot in SAS with discontinuous data

I am trying to plot SAS line plot with X- axis as Hour ( 0 , 1, 2...24) and Y axis is Decline Rate.
I started my monitoring at Hour = 20 (8PM) . I need to plot the line plot starting with 20.
When it goes to 0 , the line joins o to 20 forming a straight line.
How can i handle this in SAS. I am using PROC GPLOT
This is a difficulty for SAS, but one that can be managed.
I have two solutions:
Solution 1)
Keep the hour as a column in the data, but also add a date/time field to denote that the time is always increasing.
Use the date/time field and Decline Rate within gplot, but format the date/time field to only show hours.
Solution 2)
Add a new column to denote the order
data temp;
set temp;
order = _n_;
run;
Then sort by the new variable.
proc sort data=temp; by order; run;
Finally, utilize the sorted option within gplot. See the attached link for further information: http://www.math.wpi.edu/saspdf/gref/c21.pdf

SAS: Bars in Gchart are stacked

I am trying to use Gchart in SAS to plot the values I've got, here is my code:
title "WOE Trend of VarA.";
proc gchart data=work.VarA;
vbar VarB /
type=sum sumvar = VarA ASCENDING
subgroup = VarA nolegend
raxis=axis1
maxis=axis2
autoref clipref
width=32;
run;
There are four observations in table VarA, thus I expect to see four bars appear in the plot. However, in practive, there are two of the bars are stacked together that formed a stacked bar chart as follows. Also, the values of the observations are integers, however, there are decimals in the X-axis.
I guess I must have missed something in the option part since I am very new to this. Can anyone give me a clue that what am I wrong and how can I fix it? Thank you very much.
Probably what you have is
VarA Varb
42 0.75
20 0.75
35 -0.75
28 2.25
That would generate the above chart. If you didn't subgroup by VarA, you'd get a single bar 62 long for the first observation instead of splitting it partway through. Summing and subgrouping by the same variable doesn't make a whole lot of sense, to me, but it depends on what you're trying to do I suppose.
The decimals are likely in the data, and are just rounded by your format. If you want more useful help, you might post your actual data and code.