SAS: Keep ONLY the highest-valued groups, for Box Plots

SAS: Keep ONLY the highest-valued groups, for Box Plots - sas

Presently I have so many "groups" when doing Box Plots that the result is SEVEN panels of box plots.
I'd like to have ONE panel, with about 20 box plots (or "groups").
So, that would require cutting out a bunch of groups.
Is there a way to automatically do this?
What I have in mind is: In a data step, only keep the TOP 20 groups, using Q3 value for each group as the criterion for keeping or removing.
Any coding assistance greatly appreciated.
Nicholas Kormanik

You are not providing much information but you could sort the data by that value and then save the top 20 like this:
proc sort data=myGroups;
by q3;
run;
data myGroupsTop.
set myGroups(obs=20);
run;

Related

Showing null values for some of the Power BI rows

I have joined an excel sheet with the world population web link from Wikipedia in my Power BI tool. When I merge these two tables, it shows me the population only from the United States, other countries have null values.
Would really appreciate the help. Screenshots provided below

It looks like your merge isn't matching the rows as you expect.
I would try to investigate if there are "invisible" differences in the columnar values:
Canada (with an appended space) will not match to Canada for example. To check for this, go into the table you are merging and select to Trim the key column.
In the table you are merging into, do the same Trim operation for the key column.
Edit: Another option is to apply fuzzy matching to the merging process and to limit the amount of fuzzyness by setting maximum number of matches per row and adjusting the similarity threshold up from 0.80 to something closer to the maximum 1.00 (= exact matching).

I think issue is left join. Try with join tables with country column to present all columns.
Click dropdown and select full outer feature and expand.

Is there a way that POWERBI does not agregate all numeric data?

so, I got 3 xlsx full of data already treated, so I pretty much just got to display the data using the graphs. The problem seems to be, that Powerbi aggregates all numeric data (using: count, sum, etc.) In their community they suggest to create new measures, the thing is, in that case I HAVE TO CREATE A LOT OF MEASURES...Also, I tried to convert the data to text and even so, Powerbi counts it!!!
any help, pls?

There are several ways to tackle this:
When you pull a field into the field well for a visualisation, you can click the drop down in the field well and select "Don't summarize"
in the data model, select the column and on the ribbon select "don't summarize" as the summarization option in the Properties group.
The screenshot shows the field well option on the left and the data model options on the right, one for a numeric and one for a text field.
And, yes, you never want to use the implicit measures, i.e. the automatic calculations that Power BI creates. If you want to keep on top of what is being calculated, create your own measures, and yes, there will be many.
Edit: If by "aggregating" you are referring to the fact that text values will be grouped in a table (you don't see any duplicates), then you need to add a column with unique values to the table so all the duplicates of the text values show up. This can be done in the data source by adding an Index column, then using that Index column in the table and setting it to a very narrow with to make it invisible.

SAS Enterprise p-value and percentile

I'm considering teaching my introductory statistics course in SAS Enterprise Guide. I want my students to be able to calculate p-values and percentiles for various distributions (binomial, normal, t, chi-square) with the drop-down menus if at all possible. For example, is there a way to do both of:
DATA pval;
pval=1-PROBBNML(0.5,25,15);
RUN;
PROC PRINT DATA=pval;
RUN;
and
DATA chi;
qchi=CINV(0.95,4);
RUN;
PROC PRINT DATA=chi;
RUN;
via the drop-down menus?

When you open a data set in SAS, there is a button at the top called 'Analyze'. This has some built in functions ,although they are a bit more advanced than calculating p values.

There is in fact a way of adding dropdown menus in EG to do what you need and it is call PROMPTS.
You could have a prompt that let you select between normal and chi-square for example. You can have distribution paramaters static knowing that some won't apply in some distributions or make it dinamically depending on your selection.
here is a nice article re prompts how do you use a variable prompt
hope this helps

You could create a new dataset (File/New/Data) and define parameters of the function as variables. You can then fill in one or several lines/examples with different parameters. Using 'query builder' then computed columns icon, you can create a new variable using the desired function (CINV, PROBBNML or other) which will store the result in a new dataset.
It would be better to use only one dataset by function but you can show the result for different values of the parameters which may be interesting for your students.

Excel columns made up of different merged cells

I'm trying to tidy up a sheet with the following problem, and would appreciate any advice.
My sheet has 7 "master columns" and about 4000 rows. It was compiled by converting a load of PDF documents.
The master columns are made up of merged minor columns, but at various parts of the data, the minor columns that make up each master column are different.
eg The first master column is made up of merged columns A-H for the first 30 rows, but for the next 25 rows it's made up of merged columns A-G etc.
As I said, overall there are still the same 7 master columns from top to bottom, but the merging is different throughout...
Can anyone think of a way to fix this without doing it all manually?

Copy your horrible spreadsheet into Word with Home > Clipboard – Paste, Paste Special, Unformatted Text and replace ^t^t with ^t. Replace All repeatedly, until Word has completed its search of the document and has made 0 replacements. Copy back in to Excel.
This is not tested on your image so there might be some issues – perhaps column misalignments (where even Word’s limited regex may help to add back tabs where suitable). The result should be no merged cells – mind you someone on SE described these along the lines of “A creation of the Devil to test us beyond endurance” (ie best avoided).

Try selecting the full document and click unmerge button from the ribbon.
As per the screen shot you provided, you can select all and unmerge but getting the corresponding fields in order might be challenging.
Try using macros to set combined functionalities in a single or combine key presses

Interleaving output from two different procedures with a by value

I have a large SAS dataset and I would like to make a series of tables and charts using by value processing. I am outputing these to a PDF.
Is there any way to get SAS to alternate between the table and the chart as it goes through the data? Right now, I have to print all of the tables first and then print the charts. If it were just 4 tables/charts, then I would be ok writing
Here is a simple example:
data sample;
input byval $ item $ amount;
datalines;
A X 15
A Y 16
A Z 12
B X 25
B Y 10
B Z 18
;
run;
symbol1 i=j;
proc print data=sample;
by byval;
var item amount;
run;
proc gplot uniform data=sample;
by byval;
plot amount*item;
run;
This prints 2 tables, followed by 2 charts.
I would like the Chart for "A" to come after the table for "A" so that the reader can flip through the pdf and always see the associated charts and tables together.
I could write separate procs for each one, but then the gplot won't have a uniform axis (and it gets messy if I have 100 different groups instead of 2).
I thought about pumping them into greplay but then you can't use titles with "#BYVAL1".
Is there any easy way to do this?

I've never used it, but it may be worth checking out ODS DOCUMENT. This allows you to store the output of all your procedures and then reference specific items from them using PROC DOCUMENT.
Below is a link to the SAS website with useful information about this, in particular the paper by Cynthia Zender for the SAS Global Forum 2009.
http://support.sas.com/rnd/base/ods/odsdocument/index.html
Cynthia also regularly contributes to the SAS Support Communities website (https://communities.sas.com/community/support-communities), so it may be worth asking on there if you are still stuck.
Good luck

I don't know of any way to do what you ask directly. GREPLAY is probably the closest you'll come; the primary problem is that SAS processes the PROCs linearly, first processing the entire PROC PRINT, then the entire PROC GPLOT. GREPLAY would allow you to redisplay the output, but if that doesn't work for your needs due to the #BYVAL issue, I'm not sure there's a better solution. Perhaps you can modify the title afterwards (not sure if GREPLAY allows this)?
You could try using ODS LAYOUT, but I don't think that would be any better. The one way it could be better is if you can work out having two columns on a 'page', one column being the PROC PRINT outputs, one the PROC GPLOT, and then print the columns one page than the other. I don't think this is possible, but it might be worth exploring.
You might also try setting up a macro to do each BYVAL separately, defining the axis in a uniform manner manually (ie, defining it based on your own calculation of the correct axis parameters, as an argument to the macro). That is probably the easiest solution that might still allow #BYVAL to work properly.
You might also try browsing about Richard DeVenezia's site (http://www.devenezia.com/downloads/sas/samples/ ) which has a lot of examples of SAS/GRAPH solutions. He also posts on SAS-L (sasl#listserv.uga.edu) sometimes, not sure if I've seen him on StackOverflow. He's probably the person most likely to be able to answer the question that I know of.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

SAS: Keep ONLY the highest-valued groups, for Box Plots - sas

You are not providing much information but you could sort the data by that value and then save the top 20 like this: proc sort data=myGroups; by q3; run; data myGroupsTop. set myGroups(obs=20); run;

Related

Showing null values for some of the Power BI rows

Is there a way that POWERBI does not agregate all numeric data?

SAS Enterprise p-value and percentile

Excel columns made up of different merged cells

Interleaving output from two different procedures with a by value

Categories

Resources