How do I remove the leftmost zero (on the x-axis) when graphing a categorical variable? - stata

hist body, discrete freq xlabel(#5, labsize(small) angle(forty_five) valuelabel) produces:
I'm graphing a categorical variable, but I can't figure out how to drop the zero from the x-axis. I've tried the documentation for xlabel() and xscale() but didn't find any winners.

The short answer is to spell out that you only want xla(1/5, stuff ). How to spell out precisely which labels you want is documented.
Not the question, but this is in my view a poor graph. Go with a horizontal bar chart in which (1) the discreteness of the variable is respected;(2) the category labels are properly and readably horizontal, instead of using a most awkward device of text at 45 degrees. catplot (SSC) is one way to go. Also in Stata 13 (updated) upwards, graph hbar will do as well. You should also split the title in two lines. Even further off-topic: most consumers of this research should not care two hoots about the variable name or its question number in your survey.

Related

Having trouble building chart from multiple columns in Power BI

I would like to create a simple chart from 2 or more columns in Power BI.
Here's my data, for each column, a 1 marks an occurrence of an event, null means it did not happen.
I would like to turn this data into a very simple bar graph, showing both these fields' numeric totals (i.e. summing all the 1's). The bars would be shown side by side. I would like it to look exactly like this, only instead of male/female it would show "alcohol occurrences" and "MDMA" occurrences.
Here's my stacked column chart:
And when I try and put the column names on the axes so that they can be properly labeled, I get this:
I can achieve most of what I want using a clustered bar chart, but the problem there is that it won't let me label the axis with the alcohol / MDMA column names:
How can I make a simple, labeled graph, stacking both columns up against each other, showing the numeric sums for each column? Again, I want it to look exactly as the male/female example shown above. Is this even possible? Thank you in advance.
In the above scenario, all the values are considered to be in the same category and that means there is no direct way to do this. There are a couple of workarounds to make it look like the desired output:
To get the gap between the two bars:
You should create a new measure, Measure New = 0
Add this measure in the middle of the two values in the bar chart
This should give you a gap in between the two bars
To get the axis values added:
Create two text boxes with the text "Alcohol" and "MDMA" added
Place these text boxes below the respective bars to make it look like they are the axis values
These workarounds can become quite tedious when you have to do it for a larger number of charts/values. On a lighter note, it baffles me that you can consistently come up with these specific scenarios where you expect the charts to do exactly the opposite of what they are meant for 😉

Is it possible to develop the line charts with multiple colors in powerbi?

I developed the few Line charts for BMP280 sensor data in powerbi. This is one of the line chart for displaying the temperature value by time and device id.
But I want same line chart with different Color like this below image, whenever temperature value suddenly changes.
Can you please tell me is it possible to develop the Line chart with multiple colors?
If you're willing to consider a vertical bar chart instead of a line chart, you would be able to create a calculation for each row that determines whether the change is significant, potentially by comparing an aggregate of recent measurements to specific thresholds.
Once you do that, you would use this column's value as a legend for your visualization. So if a row has a value of "Significant Positive Change" (or something like that), the bar or bars showing that change can be red.
Your other alternative is to use an R-based visual, of which there are surely examples of this type of visualization. I'll update this answer if I find one that looks promising.
Instead of tending 1 data series, you can split it into 2 data series e.g. one with normal temperatures and one with high temperatures. Then you can just plot these in different colours. Just make sure that the ranges are same i.e. cannot be 'Auto'.

Stata Scatter Sort by Y-Axis

I want to plot the average weight (y-axis) by make (x-axis) and sort it so the heaviest make is the leftmost on the x-axis and the lightest is the rightmost on the x-axis. I thought the sort option would work.
sysuse auto, clear
keep if foreign
sort mpg
gen obsno = _n
scatter weight obsno, xla(1/22) sort(weight)
The sort() option is allowed with scatter because one of the possibilities of scatter is to connect points with a line. But it refers only to the order in which points are connected. The default is to connect points in the current sort order of the dataset. In practice the most common example is that observations are in some time order or follow some other sequence but even then scatter would respect the order of a time or other variable only if the data were in the same order -- unless, to complete the circle, a sort order were specified with this option.
You are not asking for any connnection. In that circumstance, sort() remains legal but it is ignored as of no relevance to what you're asking for.
There is no circumstance in which sort() has another effect with scatter and it will not change either axis variable to something else.
A way to get what I think you want is with the undocumented vertical option of graph dot. Some of the small choices here are just my personal idea of what looks good. For example, I have found that the default dotted grid lines often copy poorly to other software, so I use thin light grey continuous lines as a grid.
sysuse auto, clear
keep if foreign
sort mpg
gen obsno = _n
graph dot (asis) weight, over(obsno, sort(1) descending) vertical ///
linetype(line) lines(lcolor(gs12) lw(vthin)) yla(, ang(h))
It's perfectly possible to get a similar graph using scatter, just more work as you have to arrange that the observation numbers become the value labels of a variable defining the sort order.
See also quantile and qplot (Stata Journal).

Shape-matching of plots using non-linear least squares

What would b the best way to implement a simple shape-matching algorithm to match a plot interpolated from just 8 points (x, y) against a database of similar plots (> 12 000 entries), each plot having >100 nodes. The database has 6 categories of plots (signals measured under 6 different conditions), and the main aim is to find the right category (so for every category there's around 2000 plots to compare against).
The 8-node plot would represent actual data from measurement, but for now I am simulating this by selecting a random plot from the database, then 8 points from it, then smearing it using gaussian random number generator.
What would be the best way to implement non-linear least-squares to compare the shape of the 8-node plot against each plot from the database? Are there any c++ libraries you know of that could help with this?
Is it necessary to find the actual formula (f(x)) of the 8-node plot to use it with least squares, or will it be sufficient to use interpolation in requested points, such as interpolation from the gsl library?
You can certainly use least squares without knowing the actual formula. If all of your plots are measured at the same x value, then this is easy -- you simply compute the sum in the normal way:
where y_i is a point in your 8-node plot, sigma_i is the error on the point and Y(x_i) is the value of the plot from the database at the same x position as y_i. You can see why this is trivial if all your plots are measured at the same x value.
If they're not, you can get Y(x_i) either by fitting the plot from the database with some function (if you know it) or by interpolating between the points (if you don't know it). The simplest interpolation is just to connect the points with straight lines and find the value of the straight lines at the x_i that you want. Other interpolations might do better.
In my field, we use ROOT for these kind of things. However, scipy has a great collections of functions, and it might be easier to get started with -- if you don't mind using Python.
One major problem you could have would be that the two plots are not independent. Wikipedia suggests McNemar's test in this case.
Another problem you could have is that you don't have much information in your test plot, so your results will be affected greatly by statistical fluctuations. In other words, if you only have 8 test points and two plots match, how will you know if the underlying functions are really the same, or if the 8 points simply jumped around (inside their error bars) in such a way that it looks like the plot from the database -- purely by chance! ... I'm afraid you won't really know. So the plots that test well will include false positives (low purity), and some of the plots that don't happen to test well were probably actually good matches (low efficiency).
To solve that, you would need to either use a test plot with more points or else bring in other information. If you can throw away plots from the database that you know can't match for other reasons, that will help a lot.

Google charts API - multiple charts on the same image

Is there a way to display multiple charts on the same image using Google Charts api?
To elaborate:
I have one data series which I want to display as bar chart.
I have another data set which has nothing to do with the first one (well they are correlated but the values are hundred times bigger).
X-axis is for dates.
I want to have second data set displayed as line chart with Y-axis on the left.
I found something similar in "Compound charts" section but as far as I understand markers are calculated based on already displayed data set - and I want to have them independent.
In other words - is it possible to make image like this:
http://chart.apis.google.com/chart?cht=bvg&chm=D,0033FF,1,0,5,1&chs=200x150&chd=t1:30,10,20|60,40,50&chxt=y
but with the line being independent and their values axis being on the right.
I'm sorry I'm not familiar with the terminology - I'm sure there is a name for what I'm trying to achieve.
Thanks!
Only 2 years behind the curve but just to let you know that I have achieved your objective of displaying 2 datasets (one a bar chart, the other a line chart) against 2 different axis scales.
The devil is in the scaling parameter &chds and explicit axis values using &chxr. Essentially, I defined the explicit scales for the x-axis, y-axis and r-axis. and then instructed the scaling parameter to scale each dataset differently.
So for an r dataset between 0 - 10 and a y dataset between 0-2 I would write;
&chds=0,2,0,10 (y then r defined in my axis parameter, i.e. &chxt=y,r)
...and...
&chxr=0,0,2|1,0,10
Let me know if you need more detail!
I've looked into something similar to this before and have used the google chart API a lot. I'm 90% sure the answer is no. Sry :(
Yep it is possible.
Here is an example of two datasets displayed on the same axes. 1 is a bar chart the other is a line graph....
This line - chd=t1:95,1,1,3,10,3,77|95,52,44,24,11,2,1 - allows for the two datasets.
slothistype