There was a similar question to this before (how to prevent midpoints from extending), but does not answer my question.
I'm creating a histogram as follows and outputting it to a PNG file:
ods graphics on / imagename = "histoOne" imagefmt = png reset=index border=off width=4in;
ods select where=(_name_ ? 'Histogr');
proc univariate data=myData noprint; *(WHERE=(sumStake < 250));
Title1;
var sumStake;
histogram sumStake / name='histogr' vminor=4 grid lgrid=34 endpoints=0 to 250 by 20 cfill=red;
*Omit the inset, because the stats refer to the reduced dataset;
INSET n (comma11.0) mean (5.2) median (5.2) std='Std Dev'(5.2) max='Max' (5.2) / pos = ne
header = 'Summary Statistics' cfill = ywh;
run;
ods graphics off;
I want to display both the histogram and the summary statistics inset. However, the data is so skewed, that it makes no sense to show the maximum value for sumStake on the X-Axis. I want to cap the X-Axis at 250.
SAS keeps extending the ENDPOINTS value. How can I suppress this?
I don't want to use the (WHERE=(sumStake < 250)); filter as the count, mean, median and max in the inset will be based on the reduced sample, rather than the entire sample and will make no sense.
You may need to change your data in some fashion, or do the graph in a different way. Histograms in SAS don't allow much mucking about with the data in this fashion; you have to do it ahead of time. Histograms are meant largely for showing how your data falls out, so it's a bit counterintuitive to 'hide' some of the data fallout - I understand why you want to, but it is not exactly the primary purpose of histograms, hence why the functionality isn't there in SAS.
I don't think in any event that PROC UNIVARIATE gives you any ability to control this, so you may lose the inset. You can control the axis length explicitly in PROC SGPLOT histograms (with an AXIS statement in PROC SGPLOT), but they don't have the same kind of inset - you could make something probably, but not as simply. It also will still make the oversized bins, and won't reallocate those over-binned records.
Another option, particularly if you're making the inset separately anyway, would be to do the SGPLOT histogram (or bar chart) with data you've 'fixed' (right censored) and calculate the inset data separately (on the uncensored data).
Related
How do i only show data points for the low and high? Every other data point can be hovered over to get their value but by default just the low and high should always show with custom tooltip
I was able to find a solution in case this helps anyone else out.
as i generate my xData and yData from api, i then grab min/max value from array
then i loop thru yData array and pull out the index of the min/max
as i'm initializing line chart, i used a function for pointRadius in the datasets that sets the dot to 10 if it's low or high, if not set it to 0 so it doesn't show
I want to create a two-panel graph using proc sgpanel in SAS 9.4. The y-axis should be the same for the two panels, but I want both x-axis to have different values . Can that be done using SGPANEL?
Best regards,
Shaifali
Yes, use the UNISCALE option on your PANELBY statement to specify which axis you want fixed, the col, row or both. The default is both, which is not what you want, so specify that only the cols are fixed.
panelby yourSpecifications / uniscale = cols;
I'm using SAS to plot an histogram with the Kernel density. In the documentation, it is specified that we can choose the parameter c: "the standardized bandwidth for a number that is greater than 0 and less than or equal to 100." But I cannot find the default value used to create the following plot.
Does someone have an idea? Thanks!
SGPLOT minimizes the Asymptotic Mean Integrated Square Error (AMISE) for the kernel density function. According to PROC UNIVARIATE, which also can do KDE:
By default, the procedure uses the AMISE method to compute kernel density estimates.
PROC UNIVARIATE documentation
We can confirm that they both have the same default by comparing the output.
proc univariate data=sashelp.cars;
var horsepower;
histogram / kernel;
run;
In the log, we find:
NOTE: The normal kernel estimate for c=0.7852 has a bandwidth of 21.035 and an AMISE of 392E-7.
Let's plot them together and compare the values.
proc sgplot data=sashelp.cars;
density horsepower/TYPE=KERNEL;
density horsepower/TYPE=KERNEL(c=0.7852);
ods output sgplot;
run;
data diff;
set sgplot;
abs_diff = abs(KERNEL_Horsepower____Y - KERNEL_Horsepower_C_0_7852____Y);
run;
proc univariate data=diff;
var abs_diff;
run;
The average difference between all points plotted is 1.65x10^-9, with the overall largest being 6.76x10^-9. This is, essentially, zero. The reason for the differences is that the c-value given to the user in the log is lower precision than the one calculated by proc sgplot. You can get a higher precision estimate with the outkernel= option in proc univariate as well.
My question is similar to this question here:
https://community.powerbi.com/t5/Desktop/Multi-variable-Scatter-Plot/m-p/312013#M138304
I understand that you can only display one variable on the x-axis of a PowerBI scatterplot. But, I'm trying to figure out if there's a way to toggle on/off multiple variables on the scatterplot. For example, the Y-Axis wouldn't change, but you could add/remove different variables to display on the x-axis.
My variables are all in date format, so it would be great to overlay different variables on the x-axis, i.e. "event1", "event2", "event3", so that you could see them in relation to one another. Is this possible? PowerBI has virtually no documentation that I can find.
I'm not sure about multiple variables, but you can at least change the variable to display in the axis based on a slicer.
The steps:
Create 2 new tables, each representing the possible values on each axis (Just the labels and an index);
Create measures with the values you'll want to see in the axis (ex: Total Sales);
In each table, create a new Measure with a Switch that maps the labels to the created measures.
Ex:
Measure Selection I =
IF(ISCROSSFILTERED('Measure Selection I'[Measure I]);
SWITCH(
TRUE();
VALUES('Measure Selection I'[Measure I]) = "Danceability";[Total Danceability];
VALUES('Measure Selection I'[Measure I]) = "Energy";[Total Energy];
);
Blank())
Create the Visual and the slicers, and put the created measures in the corresponding places.
Here is a video with an example:
https://www.youtube.com/watch?v=gYbGNeYD4OY
I have a dataset where the images have VARYING number of labels. The number of labels is between 1 and 5. There are 100 classes.
After googling, it seems like HDF5 db with slice layer can deal with multiple labels, as in the following URL.
The only problem is that it supposes a fixed number of labels. Following this, I would have to create a 1x100 matrix, where entry value is 1 for the labeled classes, and 0 for non-label classes, as in the following definition:
layers {
name: "slice0"
type: SLICE
bottom: "label"
top: "label_matrix"
slice_param {
slice_dim: 1
slice_point: 100
}
}
where each image contains a a label looking like (1,0,0,...1,...0,....,0,1) where the vector size is 100 dimension.
Now, I apologize that my question becomes somehow vague, but is this a feasible idea? I.e., is there a better approach to this problem?
I get that you have 5 types of labels that are not always present for each data point. 1 of the 5 labels is for 100-way classification. Correct so far?
I would suggest always writing all 5 labels into your HDF5 and use a special value for when the label is missing. You can then use the missing_value option to skip computing the loss for that layer for that iteration. Using it requires add loss_param{ ignore_label = Y } to the loss layer in your network prototxt definition where Y is a scalar.
The backpropagated error will only be a function of labels that are present. If input X does not have a valid value for a label, the network will still produce an estimate for that label. But it will not be penalized for it. The output is produced without any effect on how the weights are updated in that iteration. Only outputs for non-missing labels contribute to the error signal and the weight gradients.
It seems that only the Accuracy and SoftmaxWithLossLayer layers support missing_values.
Each label is a 1x5 matrix. The first entry can be for the 100-way classification (e.g. [0-99]) and entries 2:5 have scalars that reflect the values that the other labels can take. The order of the columns is the same for all entries in your dataset. A missing label is marked by a special value of your choosing. This special value has to lie outside the set of valid label values. This will depend on what those labels represent. If a label value of -1 never occurs you can use this to flag a missing label.