Coefplot fit next to each other - stata

I am creating three coefficient plots with the same variables and want them to be one next to each other, but the third one does not fit horizontally so it appears under the other two. My code is
coefplot reg1O reg2O, bylabel(Overall) || reg1R reg2R, bylabel(Right) || reg1L reg2L, bylabel(Left) ||, keep(xvar1 xvar2 xvar3) xline(0, lpattern(dash)) nokey ciopts(lcolor(blue) recast(rcap)) color(blue)
I was wondering if there is a way to tell Stata to expand the eps figure horizontally so that all plots fit next to each other, regardless of how much space it takes?
Thank you very much!

The problem is probably not because of lack of space, but rather because you don't specify the option byopts(compact rows(1)). If you want all pots on one row or one column, you need to specify that. It's hard to tell without a reproducible example, though.
You probably know this webpage, but it's also explained here, in the section "How subgraphs are combined": http://repec.sowi.unibe.ch/stata/coefplot/getting-started.html

Related

How to automatically feed a cell value from a range of values, based on its matching condition with other cell value

I'm making a time-spending tracker based on the work I do every hour of the day.
Now, suppose I have 28 types of work listed in my tracker (which I also have to increase from time to time), and I have about 8 significance values that I have decided to relate to these 28 types of work, predefined.
I want that, as soon as I enter a type of work in cell 1 - I want the adjacent cell 2 to get automatically populated with a significance value (from a range of 8 values) that is pre-definitely set by me.
Every time I input a new or old occurrence of a type of work, the adjacent cell should automatically get matched with its relevant significance value & automatically get populated in real-time.
I know how to do it using IF, IFS, and IF_OR conditions, but I feel that based on the ever-expanding types of work & significance values, the above formulas will be very big, complicated, and repetitive in the future. I feel there's a more efficient way to achieve it. Also, I don't want it to be selected from a drop-down list.
Guys, please help me out with the most efficient way to handle this. TUIA :)
Also, I've added a snapshot and a sample sheet describing the problem.
Sample sheet
XLOOKUP() may work. Try-
=XLOOKUP(D2,A2:A,B2:B)
Or FILTER() function like-
=FILTER(B2:B,A2:A=D2)
You can use this formula for a whole column:
=INDEX(IFERROR(VLOOKUP(C14:C,A2:B9,2,0)))
Adapt the ranges to your actual tables in order to include in the second argument all the potential values and their significances
This is the formula, that worked for me (for anybody's reference):
I created another reference sheet, stating the types of work & their significance. From that sheet, I'm using either vlookup, filter, xlookup.Using gforms for inputting my data.
=ARRAYFORMULA(IFS(ROW(D:D)=1,"Significance",A:A="","",TRUE,VLOOKUP(D:D,Reference!$A:$B,2,0)))

How to apply conditional formatting (if cell is in another range) to a range of cells

So I have searched through several different questions related to this. None of them seem to be asking exactly what I'm looking for and none of the solutions I've found have worked for me thus far.
I have several columns of data (Player names) where each column's values are generated from a formula in the 2nd row of that column. The 1st row is a header (Game name). This whole range is the collection of which players are willing to play which games. These are columns D-J(ish, the list is dynamically generated with another formula, based on form responses)
I have another range of data where the 1st column is the Player and the 2nd is the player's PREFERRED game. This data is also generated with a formula based on form responses. These are columns A-B.
Here's what I'm trying to do
Using conditional formatting in columns D-J, I want to highlight the player's name if this game (in row 1 of this column) is their preferred game (range A2:B).
I've tried several different variations of VLOOKUPS, MATCHES, and FILTERS in the conditional formatting, but so far nothing has worked. The problem I run into every time is that I can't figure out how to reference the cell that the formatting is applying to, but still have it reference each individual cell over the whole range.
I know I could do this if I applied an individual conditional formatting to each individual cell. However that is a very time consuming and inelegant solution to this issue considering I'm expecting my data range to be much larger in the future. I need a conditional formatting formula that will work across the whole range or , at the very least, for an entire column.
This is a mock of what I'm trying to accomplish:
This is a link to a mock of my sheet so that you can clearly see the data layout and specific formulas I'm using:
https://docs.google.com/spreadsheets/d/1wy1T6dWJwNC_EfdCAbkuxtkJH7y4Cg3x4IyEk6R567M/edit?usp=sharing
use:
=REGEXMATCH(D3, TEXTJOIN("|", 1, FILTER($A$3:$A, $B$3:$B=D$2)))

Apply a function to a range of cells in a spreadsheet

The answers in topics with similar titles haven't given me much of a resolution to my particular problem, but possibly I am not asking the right question. It might help knowing I'm an absolute noob when it comes to spreadsheets, so finding my way around is next to nil.
Currently I can set a basic function in the first cell A1 =ROW()
Simple right? Well now here comes the complication. If I click on the bottom right of the cell and start dragging I can then apply that very same function to a whole range of cells. Let's say I apply it from A1:A10. Every cell within this group now has the same function.
Hooray! We did it, right? I applied a function to a range of cells each with their own output. But wait, if I then go back to the original cell and change its formula none of the other cells change with it. GRRRRR!!!!
There are a couple of fixes I've come up with but don't necessarily know how to implement. The first is to have every cell link back to the original cell and reference its function. This would be useful if I wanted to randomly scatter dependent cells about the document. The other would be much more useful in an orderly group where you know the exact dimensions by specifying in the original cell the size of the array you want to apply the function to.
With that said, let me hear your thoughts.
The closest I've come to an answer is to use FORMULA() which returns the formula used by a cell as text. Unfortunately all answers on evaluating the text resort to scripting. How strange! I thought something like this would be common. Might as well get to scripting.
Hold on, I may have spoke too soon. An array can be made with =MUNIT(), but it's only square. Drats!
Ok... I'm hoping the zebra stripes will eventually become its own answer unless someone else beats me to it. So a simple array can be made with ={1,2;3,4} where commas separate values by column and semicolons for values by row except to generate it you have to press Control+Shift+Enter (because reasons?). I'm thinking now that I'll need to have functions that can generate lists of values based on a single function for each row, and pray that it'll work. So, back to looking. (Wow this is taking forever)
The way I was hypothesizing can't even generate a 1x1, e.g., ={ROW()} returns Err:512 which is a formula overflow.
Alright, in summary so far I've narrowed down the two options,
1) link every cell to the original formula
2) populate an array with a single formula
each with their own incomplete answer,
a) use FORMULA() to return the formula of a cell as text
b) create a hypothetical array like so ={LIST_OF_VALUES()}
These both require a strange form of the nonexistent EVALUATE() function to 'function' correctly. Isn't that fun?
Google Sheets handles case b by allowing ={ROW()}Control+Shift+Enter to generate =ArrayFormula({ROW()}). Working with the general case of any sized array being filled with a single function doesn't exist in the world of spreadsheets it seems. That's very saddening because I can't think of a much better tool for what I want to do. Copy paste it is until I need to use macros.
Depending on your specific use case, creating a user-defined function may help:
use the Basic IDE to create your function;
apply it to any cells on any sheet;
modifying the Basic code will affect all cells where the function is used.
I've elaborated the steps in an answer on superuser.
Sure, you could write some complex code to update functions, but wouldn't the easy way be just to drag it to the same range of cells the same way you did before? It should properly overwrite the existing code in there, and if it doesn't, you can just as easily delete the outdated code and drag the new code in.
Probably the best approach is to simply drag the amended formula over the range of cells (as advised by OldBunny2800). This is less error prone and easier to maintain than a custom macro.
Another option would be to use an array function. Then you only have to edit the function once, and the same edit will be automatically applied to the whole range of cells in that array function.

Data mining with Weka

I am learning how to do data mining and I am using this data set from UCI's website.
http://archive.ics.uci.edu/ml/datasets/Forest+Fires
The problem I am encountering is how to deal with the area class. My understanding from the description is that I need to apply ln(x+1) to area using AddExpression.
Am I going in the correct direction with this? Or are there other filters I should investigate? Thank you.
I try to answer your question based on the little information you provide. And I haven't worked with the forest-fires data set, but by inspection I see that the classifier attribute "area" often has the value 0. Maybe you can't simply filter out these rows with Area = 0. Your dataset might become too small, or whatnot.
I think you are asked to perform regression of some attribute(s) against "log(area)" in order to linearize it. However,when you try to calculate the log of the Area, values such as log(0) are a problem. values between 0 and 1 might also be problematic.
So a common fix is to add 1 to the value of "Area". This introduces a systematic error, but it is small, and it removes all 0-values, and you can still derive useful models from your log(x+1)-transformed dataset.
And yes, in Weka you do this by "Preprocess"/ AddExpression(x+1). This creates a new attribute. Then you might remove the old area attribute.
Of course, in interpreting your model, you should be aware of the transformation. If you just want to find out what the significant independent attributes are in your linear regression model, I'd say the transformation does not matter. The data points are just shifted a little bit.

How to normalize sequence of numbers?

I am working user behavior project. Based on user interaction I have got some data. There is nice sequence which smoothly increases and decreases over the time. But there are little discrepancies, which are very bad. Please refer to graph below:
You can also find data here:
2.0789 2.09604 2.11472 2.13414 2.15609 2.17776 2.2021 2.22722 2.25019 2.27304 2.29724 2.31991 2.34285 2.36569 2.38682 2.40634 2.42068 2.43947 2.45099 2.46564 2.48385 2.49747 2.49031 2.51458 2.5149 2.52632 2.54689 2.56077 2.57821 2.57877 2.59104 2.57625 2.55987 2.5694 2.56244 2.56599 2.54696 2.52479 2.50345 2.48306 2.50934 2.4512 2.43586 2.40664 2.38721 2.3816 2.36415 2.33408 2.31225 2.28801 2.26583 2.24054 2.2135 2.19678 2.16366 2.13945 2.11102 2.08389 2.05533 2.02899 2.00373 1.9752 1.94862 1.91982 1.89125 1.86307 1.83539 1.80641 1.77946 1.75333 1.72765 1.70417 1.68106 1.65971 1.64032 1.62386 1.6034 1.5829 1.56022 1.54167 1.53141 1.52329 1.51128 1.52125 1.51127 1.50753 1.51494 1.51777 1.55563 1.56948 1.57866 1.60095 1.61939 1.64399 1.67643 1.70784 1.74259 1.7815 1.81939 1.84942 1.87731
1.89895 1.91676 1.92987
I would want to smooth out this sequence. The technique should be able to eliminate numbers with characteristic of X and Y, i.e. error in mono-increasing or mono-decreasing.
If not eliminate, technique should be able to shift them so that series is not affected by errors.
What I have tried and failed:
I tried to test difference between values. In some special cases it works, but for sequence as presented in this the distance between numbers is not such that I can cut out errors
I tried applying a counter, which is some X, then only change is accepted otherwise point is mapped to previous point only. Here I have great trouble deciding on value of X, because this is based on user-interaction, I am not really controller of it. If user interaction is such that its plot would be a zigzag pattern, I am ending up with 'no user movement data detected at all' situation.
Please share the techniques that you are aware of.
PS: Data made available in this example is a particular case. There is no typical pattern in which numbers are going to occure, but we expect some range to be continuous with all the examples. Solution I am seeking is generic.
I do not know how much effort you want to involve in this problem but if you want theoretical guaranties,
topological persistence seems well adapted to your problem imho.
Basically with that method, you can filtrate local maximum/minimum by fixing a scale
and there are theoritical proofs that says that if you sampling is
close from your function, then you extracts correct number of maximums with persistence.
You can see these slides (mainly pages 7-9 to get the idea) to get an idea of the method.
Basically, if you take your points as a landscape and imagine a watershed starting from maximum height and decreasing, you have some picks.
Every pick has a time where it is born which is the time where it becomes emerged and a time where it dies which is when it merges with an higher pick. Now a persistence diagram pictures a point for every pick where its x/y coordinates are its time of birth/death (by assumption the first pick does not die and is not shown).
If a pick is a global maximal, then it will be further from the diagonal in the persistence diagram than a local maximum pick. To remove local maximums you have to remove picks close to the diagonal. There are fours local maximums in your example as you can see with the persistence diagram of your data (thanks for providing the data btw) and two global ones (the first pick is not pictured in a persistence diagram):
If you noise your data like that :
You will still get a very decent persistence diagram that will allow you to filter local maximum as you want :
Please ask if you want more details or references.
Since you can not decide on a cut off frequency, and not even on the filter you want to use, I would implement several, and let the user set the parameters.
The first thing that I thought of is running average, and you can see that there are so many things to set, to get different outputs.