I would like to wrap large matrix-style output, as per the corr commands wrap option in an ado file. Unfortunately, corr is not implemented as an ado-file, and pwcorr, which is implemented as an ado-file that I might model my approach on is missing the wrap option.
It would be useful to understand both how do do this for a fixed output width, and also useful to know how to base the definition of width for fixed output to the width of current window settings.
I use matlist for displaying most of my matrices, and it will also take care of wrapping.
More complicated output can be handled by being creative when making the matrix. I find the dotz option of matlist often helpful in that respect. Another trick I use a lot is the fact that a column and name can have two parts: an "equation name" which can be shared with multiple rows or columns and a "row or column name within the equation".
If you need more flexibility you can look at frmttable.
Related
In Stata, is there a way to redirect the data that a command does into a table instead of a graph?
Example: if someone created a normal probability distribution of data with the pnorm var_name command, is there a way to redirect the data so that instead of appearing in a graph, it appears in a table?
To add to #Noobie's answer:
Different commands work in different ways. There's no better short summary.
What you can look out for includes
generate() options that produce new variables. (There is absolute rule that the options have this name, but that or a similar name is the most common single variety.)
Options that allow saving results to new datasets.
Saved results, especially those visible after return list or ereturn list. These can be quite elaborate, e.g. saving of matrices of counts after tabulate.
More broadly, Stata commands aren't functions! One characteristic of a function, as so named in many languages or programs, is that there is a result, with special cases where the result is void or null. There clearly are statistical programs which in broad terms hinge on calling functions which have results, and what you see displayed is often a side-effect of that. Stata commands don't work like that in the sense that the results of a program can be various. In the case of commands designed just to show something, the "result" may be a display. It's worth noting that Mata, which underlies and underpins Stata, is more recognisably a C-like language, with (e.g.) many matrix extensions, which is based on functions (and much else).
Yes and no. It really depends on the command you are using. You should look at the help files first.
For instance, pnorm does not allow that. You can create the data yourself using the formula for pnorm described in the help file, where the cumulative distribution at some point is plotted against the so-called plotting position.
Other Stata commands allow you to generate the points directly. This is the case for kdensity for instance.
The answers in topics with similar titles haven't given me much of a resolution to my particular problem, but possibly I am not asking the right question. It might help knowing I'm an absolute noob when it comes to spreadsheets, so finding my way around is next to nil.
Currently I can set a basic function in the first cell A1 =ROW()
Simple right? Well now here comes the complication. If I click on the bottom right of the cell and start dragging I can then apply that very same function to a whole range of cells. Let's say I apply it from A1:A10. Every cell within this group now has the same function.
Hooray! We did it, right? I applied a function to a range of cells each with their own output. But wait, if I then go back to the original cell and change its formula none of the other cells change with it. GRRRRR!!!!
There are a couple of fixes I've come up with but don't necessarily know how to implement. The first is to have every cell link back to the original cell and reference its function. This would be useful if I wanted to randomly scatter dependent cells about the document. The other would be much more useful in an orderly group where you know the exact dimensions by specifying in the original cell the size of the array you want to apply the function to.
With that said, let me hear your thoughts.
The closest I've come to an answer is to use FORMULA() which returns the formula used by a cell as text. Unfortunately all answers on evaluating the text resort to scripting. How strange! I thought something like this would be common. Might as well get to scripting.
Hold on, I may have spoke too soon. An array can be made with =MUNIT(), but it's only square. Drats!
Ok... I'm hoping the zebra stripes will eventually become its own answer unless someone else beats me to it. So a simple array can be made with ={1,2;3,4} where commas separate values by column and semicolons for values by row except to generate it you have to press Control+Shift+Enter (because reasons?). I'm thinking now that I'll need to have functions that can generate lists of values based on a single function for each row, and pray that it'll work. So, back to looking. (Wow this is taking forever)
The way I was hypothesizing can't even generate a 1x1, e.g., ={ROW()} returns Err:512 which is a formula overflow.
Alright, in summary so far I've narrowed down the two options,
1) link every cell to the original formula
2) populate an array with a single formula
each with their own incomplete answer,
a) use FORMULA() to return the formula of a cell as text
b) create a hypothetical array like so ={LIST_OF_VALUES()}
These both require a strange form of the nonexistent EVALUATE() function to 'function' correctly. Isn't that fun?
Google Sheets handles case b by allowing ={ROW()}Control+Shift+Enter to generate =ArrayFormula({ROW()}). Working with the general case of any sized array being filled with a single function doesn't exist in the world of spreadsheets it seems. That's very saddening because I can't think of a much better tool for what I want to do. Copy paste it is until I need to use macros.
Depending on your specific use case, creating a user-defined function may help:
use the Basic IDE to create your function;
apply it to any cells on any sheet;
modifying the Basic code will affect all cells where the function is used.
I've elaborated the steps in an answer on superuser.
Sure, you could write some complex code to update functions, but wouldn't the easy way be just to drag it to the same range of cells the same way you did before? It should properly overwrite the existing code in there, and if it doesn't, you can just as easily delete the outdated code and drag the new code in.
Probably the best approach is to simply drag the amended formula over the range of cells (as advised by OldBunny2800). This is less error prone and easier to maintain than a custom macro.
Another option would be to use an array function. Then you only have to edit the function once, and the same edit will be automatically applied to the whole range of cells in that array function.
I am learning how to do data mining and I am using this data set from UCI's website.
http://archive.ics.uci.edu/ml/datasets/Forest+Fires
The problem I am encountering is how to deal with the area class. My understanding from the description is that I need to apply ln(x+1) to area using AddExpression.
Am I going in the correct direction with this? Or are there other filters I should investigate? Thank you.
I try to answer your question based on the little information you provide. And I haven't worked with the forest-fires data set, but by inspection I see that the classifier attribute "area" often has the value 0. Maybe you can't simply filter out these rows with Area = 0. Your dataset might become too small, or whatnot.
I think you are asked to perform regression of some attribute(s) against "log(area)" in order to linearize it. However,when you try to calculate the log of the Area, values such as log(0) are a problem. values between 0 and 1 might also be problematic.
So a common fix is to add 1 to the value of "Area". This introduces a systematic error, but it is small, and it removes all 0-values, and you can still derive useful models from your log(x+1)-transformed dataset.
And yes, in Weka you do this by "Preprocess"/ AddExpression(x+1). This creates a new attribute. Then you might remove the old area attribute.
Of course, in interpreting your model, you should be aware of the transformation. If you just want to find out what the significant independent attributes are in your linear regression model, I'd say the transformation does not matter. The data points are just shifted a little bit.
I have done this many times with Excel and Java... This time I need to do it using Stata because it is more convenient to preserve variables' labels. How can I restructure dataset_1 into dataset_2 below?
I need to transform the following dataset_1:
into dataset_2:
I know one way, which is a little awkward... I mean, I could expand all the observations, then create variable obsNo, and then, rename variables...is there any better way?
Stata is wonderful at this sort of thing, it's a simple reshape. Your data is a little awkward, as the reshape command was designed to work with variables where the common part of the variable name (in your case, Wage) comes first. In the documentation for reshape, "Wage" would be the stub. The part following Wage is required to be numeric. If you first sort your variable names by
rename (raceWhiteWage raceBlackWage raceAsianWage) (Wage1 Wage2 Wage3)
Then you can do:
reshape long Wage, i(state year) j(race)
That should give you the output your are looking for. You will have a column labeled "race", with values of 1 for White, 2 for Black, and 3 for Asian.
I created an accountant program that basically lets the user add rows and does some math. My problem is, I need to make it so it prints the table on paper once the user presses a button. How can I accomplish that? Please explain it step-by-step as I'm a beginner.
EDIT: Why was this question voted down. What is wrong with it?
The basic output tool in C++ is std::ostream, but it is very limited.
It's possible (but not always easy) to format tables using it if the
output is using a fixed width font, but this is rarely the case today.
If you can get away with using a fixed width font, the manipulators of
iostream should be sufficient; decide the width of each column, and set
the width (and alignment—left or right) using the appropriate
manipulators when you output the field.
Otherwise, you'll have to determine what markup language the printed
output should use—Postscript is widespread, but far from univeral.
Having done that, you'll have to iterate over lines, and in each line,
over the columns, generating the correct markup for each one. If you're
generating something like Postscript (or most printer markup languages),
you'll have to keep track of absolute positions, and maybe calculate
column width and such, determining the width of each field depending on
the font being used and the width of each character in that font.
More than one program I've seen has output a LaTeX source, and then used
system to invoke LaTeX (or pdflatex, to generate PDF); this supposes
that LaTeX is installed on all of the machines one which the program
will run, but LaTeX will take care of all of the above calculations; you
just output your columns, separated by a '&', with each line
terminated by two '\', with the appropriate surrounding commands, and
LaTeX does the rest. (This is the solution I'd recommend, if you can
possibly impose the presence of LaTeX. As old and as un-user-friendly
as it is, LaTeX still generates the best output of any program I've
tried.)