How to customise appearance of DAGs using dagitty - directed-acyclic-graphs

I am using dagitty online browser to draw DAGs for a project: http://www.dagitty.net/dags.html#
I cannot figure out how to customise the appearance of my DAGs to make them look nice, for inclusion in a paper. Code and images are provided at the end of the post.
I would like to be able to do any of the below:
A) Use superscripts instead of underscores
B) Change the size of the font
C) In the 'classic' style, the variable names cover the arrows in the arrow is going directly up, so you cannot see the direction of causality
D) Just have variable names representing each node, instead of boxes
In the paper explaining how to use the R package of this tool, there are many DAGs which have the above mentioned properties, but the focus of the paper is to explain how to use DAGs to properly assess causality etc, nothing about the appearance of the DAGs. I would be very grateful for some help on this, or for suggestions of other tools/packages that allow you to draw DAGs. I also fear this may be better placed on the maths/statistics stack, which will likely have more users of the dagitty tool. Mods please move there if appropiate. I posted here as it's purely a coding issue.
The following code can be input in the 'model code' section of the dagitty website to reproduce the two DAG images I have provided. By choosing 'classic' or 'SEM-like' under the Diagram style, you can get the two appearances.
dag {
T_0 [adjusted,pos="-1.575,0.494"]
X_0 [adjusted,pos="-1.574,0.280"]
Y [outcome,pos="-1.288,0.250"]
T_0 -> X_0
T_0 -> Y
X_0 -> Y
}
DAG1
DAG2
EDIT:
Using dagitty package in R and ggdag gets the nicer aesthetics on the DAG image. Thank you for the response Sebastian. However it does not seem to respect positioning co-ordinates for lines to make them curved. This is an issue in creating the DAG I want to create in my actual work. To continue with the example above, the following code gets a curved line between T_0 and Y:
dag {
T_0 [adjusted,pos="-1.575,0.494"]
X_0 [adjusted,pos="-1.574,0.280"]
Y [outcome,pos="-1.288,0.250"]
T_0 -> X_0
T_0 -> Y [pos="-1.400,0.500"]
X_0 -> Y
}
Producing this image:
But this does not happen when using dagitty and ggdag in R. The exact same code produces a DAG with straight lines. Is it possible to produce curved lines through specific co-ordinates in R with dagitty and ggdag?

Try the R package ggdag, see here, also, checkout the manual here

Related

Is there a way to somewhat automate a Double Elimination bracket on google sheets?

I'm looking for some help in trying to automate a double elimination bracket that I created on Google sheets. I won't pretend to know much about it. I'm mostly self taught on anything when it comes to sheets and I know that it requires an if/then solution but I think I've come to a road block and require assistance from people that actually know how to do it.
Basically I'm running a double elimination tournament. A player has to lose twice in order to be completely eliminated. I'd like to make it so that when I type W in a cell, it takes the data from the cell to the left of it (Player 1 in the screenshot) and puts it into the next round in the cell noted by the arrow. I'd also like to make it so that if I typed L instead, it would take the data from the cell to the left of it and put the name (Player 2 in the screenshot) into the losers bracket. "Player 1-16" are placeholder names, as I'll have real names in the lookup sheet I have hidden once its ready.
Is any of this possible? Or am I being too ambitious at this point? I could do this manually for sure, but since I'm gonna have other people helping when I'm not around, I'd rather it be a simple W or L so that they don't mess around with other aspects of the sheet(s).
I would appreciate ANY and all assistance. I've attached a screenshot and a link to the sheet so that you can copy and mess around with it. Thank you in advance!
Double Elim Google Sheet
Double Elim Screenshot
EDIT 3/18 6:30PM EST: The furthest I've gotten through the use of IF(OR is the screenshot below. If I put W, it bring 'Player 1' to I6. If E4 is left blank or has L instead, I6 says 'FALSE' instead. This is where I'm having the most trouble. How do I continue from here, and how do I also take into account C5 & E5 for I6?
Double Elim Screenshot 2
try in C39:
=FILTER(C4:D5, E4:E5="L")

How to get y axis range in Stata

Suppose I am using some twoway graph command in Stata. Without any action on my part Stata will choose some reasonable values for the ranges of both y and x axes, based both upon the minimum and maximum y and x values in my data, but also upon some algorithm that decides when it would be prettier for the range to extend instead to a number like '0' instead of '0.0139'. Wonderful! Great.
Now suppose that after (or while) I draw my graph, I want to slap some very important text onto it, and I want to be choosy about precisely where the text appears. Having the minimum and maximum values of the displayed axes would be useful: how can I get these min and max numbers? (Either before or while calling the graph command.)
NB: I am not asking how to set the y or x axis ranges.
Since this issue has been a bit of a headache for me for quite some time and I believe there is no good solution out there yet I wanted to write up two ways in which I was able to solve a similar problem to the one described in the post. Specifically, I was able to solve the issue of gray shading for part of the graph using these.
Define a global macro in the code generating the axis labels This is the less elegant way to do it but it works well. Locate the tickset_g.class file in your ado path. The graph twoway command uses this to draw the axes of any graph. There, I defined a global macro in the draw program that takes the value of the omin and omax locals after they have been set to the minimum between the axis range and data range (the command that does this is local omin = min(.scale.min,omin) and analogously for the max), since the latter sometimes exceeds the former. You could also define the global further up in that code block to only get the axis extent. You can then access the axis range using the globals after the graph command (and use something like addplot to add to the previously drawn graph). Two caveats for this approach: using global macros is, as far as I understand, bad practice and can be dangerous. I used names I was sure wouldn't be included in any program with the prefix userwritten. Also, you may not have administrator privileges that allow you to alter this file based on your organization's decisions. However, it is the simpler way. If you prefer a more elegant approach along the lines of what Nick Cox suggested, then you can:
Use the undocumented gdi natscale command to define your own axis labels The gdi commands are the internal commands that are used to generate what you see as graph output (cf. https://www.stata.com/meeting/dcconf09/dc09_radyakin.pdf). The tickset_g.class uses the gdi natscale command to generate the nice numbers of the axes. Basic documentation is available with help _natscale, basically you enter the minimum and maximum, e.g. from a summarize return, and a suggested number of steps and the command returns a min, max, and delta to be used in the x|ylabel option (several possible ways, all rather straightforward once you have those numbers so I won't spell them out for brevity). You'd have to adjust this approach in case you use some scale transformation.
Hope this helps!
I like Nick's suggestion, but if you're really determined, it seems that you can find these values by inspecting the output after you set trace on. Here's some inefficient code that seems to do exactly what you want. Three notes:
when I import the log file I get this message:
Note: Unmatched quote while processing row XXXX; this can be due to a formatting problem in the file or because a quoted data element spans multiple lines. You should carefully inspect your data after importing. Consider using option bindquote(strict) if quoted data spans multiple lines or option bindquote(nobind) if quotes are not used for binding data.
Sometimes the data fall outside of the min and max range values that are chosen for the graph's axis labels (but you can easily test for this).
The log linesize is actually important to my code below because the key values must fall on the same line as the strings that I use to identify the helpful rows.
* start a log (critical step for my solution)
cap log close _all
set linesize 255
log using "log", replace text
* make up some data:
clear
set obs 3
gen xvar = rnormal(0,10)
gen yvar = rnormal(0,.01)
* turn trace on, run the -twoway- call, and then turn trace off
set trace on
twoway scatter yvar xvar
set trace off
cap log close _all
* now read the log file in and find the desired info
import delimited "log.log", clear
egen my_string = concat(v*)
keep if regexm(my_string,"forvalues yf") | regexm(my_string,"forvalues xf")
drop if regexm(my_string,"delta")
split my_string, parse("=") gen(new)
gen axis = "vertical" if regexm(my_string,"yf")
replace axis = "horizontal" if regexm(my_string,"xf")
keep axis new*
duplicates drop
loc my_regex = "(.*[0-9]+)\((.*[0-9]+)\)(.*[0-9]+)"
gen min = regexs(1) if regexm(new3,"`my_regex'")
gen delta = regexs(2) if regexm(new3,"`my_regex'")
gen max_temp= regexs(3) if regexm(new3,"`my_regex'")
destring min max delta , replace
gen max = min + delta* int((max_temp-min)/delta)
*here is the info you want:
list axis min delta max

SPSS- how to make the histogram template refer to the y axis as percentage

I have an odd issue regarding the SPSS (version 20) use of Chart Template, and any help will be appriciated.
I used the GUI to manualy define a chart template for Histograms. Those are simple definitions:
1) set the x axis between 0 to 100.
2) set the y axis as percent and not as actual number of examples within each bin.
3) set the bin sizes to 5.
4) set the maximal value of the y axis to 20.
I saved the template using the File->Save ChartTemplate option after changing the definitions of one histogram.
Oddly, when I implement the template on a new histogram, only definitions 1,3,4 are generated while 2 is omitted. I searched for a solution and did not find any. This is extremly frustrating since I need to waste time and effort to manualy reset the axis to the right definition over any new histogram I make (which is a lot :/ ).
There might be a way to hack the template code using notepad but I did not see any mention of the Y axis there.
Any help and comment would be much appriciated.
I can't say offhand how to set up a template to do any of those aspects, but here is an example using syntax to specify those four options.
SET SEED 10.
INPUT PROGRAM.
LOOP #i = 1 TO 500.
COMPUTE Var = RV.UNIFORM(0,90).
END CASE.
END LOOP.
END FILE.
END INPUT PROGRAM.
DATASET NAME Sim.
FORMATS Var (F3.0).
EXECUTE.
GGRAPH
/GRAPHDATASET NAME="graphdataset" VARIABLES=Var MISSING=LISTWISE REPORTMISSING=NO
/GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
SOURCE: s=userSource(id("graphdataset"))
DATA: Var=col(source(s), name("Var"))
GUIDE: axis(dim(1), label("Var"), delta(5))
GUIDE: axis(dim(2), label("Percent in Bin"))
SCALE: linear(dim(1), min(0), max(100))
SCALE: linear(dim(2), max(20))
ELEMENT: interval(position(summary.percent.count(bin.rect(Var, binWidth(5)), base.all(acrossPanels()))))
END GPL.
And this is what the graph looks like for me (with my default chart template) in V25.

Why must back-edges be taken into account in Edmonds-Karp Maximum Flow?

I was trying to implement Edmonds-Karp in C++ for maximum flow, and I wrote it slightly differently:
Instead of going through all edges in residual graph, I only went through the edges that are present in the original graph, using the adjacency list.
I did not update any back-edges when updating the residual graph with min flow.
Interestingly, when I ran my code, it gave me correct results. So I went to Wikipedia's example, where it specifically show how a back-edge is being used. When I fed this graph to my code, I got the correct answer again.
I also checked the resultant flow matrix, and it was identical to Wikipedia's.
Can someone explain why we must add and update back-edges, and maybe give an example where they are critical?
Here's the code that I wrote (updated to include back edges):
Consider the following flow network
Suppose the first flow is s → u → v → t. (If you object that that the BFS of Edmonds-Karp would never choose this, then augment the graph with some more vertices between s and v and between u and t).
Without reversing flow u → v, it is impossible to obtain the optimal flow of 20.
try out the following case:
int main() {
Digraph<int> g(8);
g.addEdge(0,1,1);
g.addEdge(1,2,1);
g.addEdge(2,4,1);
g.addEdge(0,3,1);
g.addEdge(3,4,1);
g.addEdge(4,7,1);
g.addEdge(3,5,1);
g.addEdge(5,6,1);
g.addEdge(6,7,1);
cout<<g.maxFlowEdmondsKarp(0,7);
return 0;
}
Visualization:
your program takes the shortest way 0-3-4-7 first and has then no chance to find 0-1-2-4-7 and 0-3-5-6-7. You get 1 but 2 would be the right answer.
Would you have inserted the back-edge, then you would find the following paths:
0-3-4-7
0-1-2-4-3(back-edge!)-5-6-7, getting the max flow 2.

Prolog - Term replacement, Term alteration in workflow graphs

In this link ( Meta Interpreter ) I believe to have found a nifty way of solving a problem I have to tackle, but since my prolog is very bad I'd first ask if its even possible what I have in mind.
I want to transform certain parts of a workflow/graph depending on a set of rules. A graph basically consists of sequences (a->b) and split/joins, which are either parallel or conditional, i.e. two steps run in parallel in the workflow or a single branch is picked depending on a condition (the condition itself does not matter on this level) (parallel-split - (a && b) - parallel-join) etc. Now a graph usually has nodes and edges, with the form of using terms I want to get rid of edges.
Furthermore each node has a partner attribute, specifying who will execute it.
I'll try to give a simple example what I want to achieve:
A node called A, executed by a partner X, connected with a node called B, executed by a partner Y.
A_X -> B_Y
seq((A,X),(B,Y))
If I detect a pattern like this, i.e. two steps in sequence with different partners, I want this to be replaced with:
A_X -> Send_(X-Y) -> Receive_(Y-X) - B_Y // send step from X to Y and a receive step at Y waiting for something from X
seq((A,X), seq(send(X-Y), seq(receive(Y-X), B)))
If anyone could give me some pointers or help to come up with a solution I would be very thankful!
A graph basically consists of sequences (a->b) and split/joins, which are either parallel or conditional, i.e. two steps run in parallel in the workflow or a single branch is picked depending on a condition
This sounds an awful lot like an and/or graph. Prolog algorithms on these graphs are covered by Ivan Bratko in Prolog Programming for Artificial Intelligence, chapter 13. Even if your graphs aren't really and/or graphs, you may be able to adapt some of these algorithms to your task.