I am trying to write a research paper in rmarkdown, in which I am trying to create a summary stat table using stargazer package, but I it shows me the following errors when I include the argument notes = c("All variables are defined in Appendix A.", "All continuous variables are winsorized at 1% and 99%.", "The pctl(25(75) corresponds to 25% (75%) percentile.") in stargazer -
! File ended while scanning use of \multicolumn.
<inserted text>
\par
<*> sample_articles.tex
Below is the table portion from my tex file for the above problem-
\begin{table}[!htbp] \centering
\caption{Summary Stat of Variables}
\label{}
\tiny
\begin{tabular}{#{\extracolsep{-5pt}}lcccccc}
\\[-1.8ex]\hline
\hline \\[-1.8ex]
Statistic & \multicolumn{1}{c}{N} & \multicolumn{1}{c}{Mean} & \multicolumn{1}{c}{St. Dev.} & \multicolumn{1}{c}{Pctl(25)} & \multicolumn{1}{c}{Median} & \multicolumn{1}{c}{Pctl(75)} \\
\hline \\[-1.8ex]
DEGREE & 19,114 & 67.7 & 15.7 & 56.3 & 68.7 & 79.7 \\
EIGENVECTOR & 19,114 & 61.6 & 20.1 & 48.7 & 64 & 77 \\
BETWEENNESS & 19,114 & 67.8 & 23.7 & 53.0 & 72.8 & 87.0 \\
CLOSENESS & 19,114 & 61.2 & 21.9 & 46.1 & 63 & 78 \\
OVERALLCENTRALITY & 19,114 & 67.4 & 21.6 & 51.8 & 71 & 85 \\
LNASSETS & 19,114 & 6.9 & 2.2 & 5.5 & 7.0 & 8.4 \\
LEVERAGE & 19,114 & 0.5 & 0.3 & 0.4 & 0.5 & 0.7 \\
INVREC & 19,114 & 0.2 & 0.2 & 0.1 & 0.2 & 0.3 \\
LOSS & 19,114 & 0.3 & 0.4 & 0 & 0 & 1 \\
ROA & 19,114 & 1.1 & 22.3 & $-$0.01 & 4.9 & 10.6 \\
ZSCORE & 19,114 & 0.9 & 0.9 & 0 & 1 & 2 \\
MERGER & 19,114 & 0.4 & 0.5 & 0 & 0 & 1 \\
MTB & 19,114 & 3.0 & 4.7 & 1.3 & 2.1 & 3.6 \\
FOREIGN & 19,114 & 0.5 & 0.5 & 0 & 0 & 1 \\
EXTRAORDINARY & 19,114 & 0.01 & 0.1 & 0 & 0 & 0 \\
SEGMENT & 19,114 & 2.1 & 0.8 & 1.4 & 2.0 & 2.6 \\
SPECIALIZED & 19,114 & 0.3 & 0.5 & 0 & 0 & 1 \\
MATERIALWEAKNESS & 19,114 & 0.05 & 0.2 & 0 & 0 & 0 \\
RESTATEMENT & 19,114 & 0.1 & 0.3 & 0 & 0 & 0 \\
BIGN & 19,114 & 0.8 & 0.4 & 1 & 1 & 1 \\
GOINGCONCERN & 19,114 & 0.02 & 0.1 & 0 & 0 & 0 \\
CALENDARYEAR & 19,114 & 0.8 & 0.4 & 1 & 1 & 1 \\
LNNONAUDFEES & 19,114 & 11.9 & 1.8 & 10.7 & 12.0 & 13.2 \\
LNAUDFEES & 19,114 & 14.0 & 1.3 & 13.2 & 14.0 & 14.8 \\
AUDTURNOVER & 19,114 & 0.1 & 0.2 & 0 & 0 & 0 \\
RESTRUCTURE & 19,114 & $-$0.002 & 0.1 & $-$0.001 & 0.0 & 0.0 \\
LITIGATE & 19,114 & 0.3 & 0.4 & 0 & 0 & 1 \\
AGE & 19,114 & 3.0 & 0.7 & 2.5 & 3.0 & 3.6 \\
AUDITORTENURE & 19,114 & 8.2 & 4.4 & 5 & 8 & 12 \\
AUDITLAG & 19,114 & 8.0 & 1.1 & 7.4 & 7.7 & 8.6 \\
AUDFEES (ml) & 19,114 & 2.7 & 5.7 & 0.5 & 1.2 & 2.7 \\
NAUDFEES (ml) & 19,114 & 0.7 & 2.1 & 0.05 & 0.2 & 0.5 \\
\hline \\[-1.8ex]
\multicolumn{7}{l}{All variables are defined in Appendix A.} \\
\multicolumn{7}{l}{All continuous variables are winsorized at level 1% and 99%.} \\
\multicolumn{7}{l}{The pctl(25 (75)) corresponds to 25% (75%) percentile.} \\
\end{tabular}
\end{table}
However if I take out the above notes argument, it works fine.
Below is my r code in rmarkdown without notes argument in stargazer -
sumstat_label <- c(
"DEGREE",
"EIGENVECTOR",
"BETWEENNESS",
"CLOSENESS",
"OVERALLCENTRALITY",
"LNASSETS",
"LEVERAGE",
"INVREC",
"LOSS",
"ROA",
"ZSCORE",
"MERGER",
"MTB",
"FOREIGN",
"EXTRAORDINARY",
"SEGMENT",
"SPECIALIZED",
"MATERIALWEAKNESS",
"RESTATEMENT",
"BIGN",
"GOINGCONCERN",
"CALENDARYEAR",
"LNNONAUDFEES",
"LNAUDFEES",
"AUDTURNOVER",
"RESTRUCTURE",
"LITIGATE",
"AGE",
"AUDITORTENURE",
"AUDITLAG",
"AUDFEES (ml)",
"NAUDFEES (ml)")
note_label <- c("All variables are defined in Appendix A.",
"All continuous variables are winsorized at 1% and 99%.",
"The pctl(25(75) corresponds to 25% (75%) percentile.")
stargazer(as.data.frame(sum_stat[c(
"DEGREE",
"EIGENVECTOR",
"BETWEENNESS",
"CLOSENESS",
"OVERALLCENTRALITY",
"LNASSETS",
"LEVERAGE",
"INVREC",
"LOSS",
"ROA",
"ZSCORE",
"MERGER",
"MTB",
"FOREIGN",
"EXTRAORDINARY",
"SEGMENT",
"SPECIALIZED",
"MATERIALWEAKNESS",
"RESTATEMENT",
"BIGN",
"GOINGCONCERN",
"CALENDARYEAR",
"LNNONAUDFEES",
"LNAUDFEES",
"AUDTURNOVER",
"RESTRUCTURE",
"LITIGATE",
"AGE",
"AUDITORTENURE",
"AUDITLAG",
"AUDFEES",
"NAUDFEES")]),
summary.stat = c("n", "mean", "sd", "p25", "median", "p75"),
column.sep.width = "-5pt",
title= "Summary Stat of Variables", type = "latex",
digits= 1,
header = FALSE,
notes.align = "l",
font.size = "small",
single.row = T,
no.space = T,
covariate.labels = sumstat_label
)
Does any body have any idea how I can append the notes argument in the table with the type = latex in stargazer. Thanks.
The issue is that the '%' symbol in your notes is passed to Latex as a special character. In Latex '%' means "start a comment here and ignore everything after the percent symbol." One solution is to replace the percent symbol with the word percent or percentile (as noted in the comment by the original author).
In some cases, though, a symbol is preferable (such as in labels that we want to keep short). In those cases, "escaping" the percent symbol will often solve the problem. Stargazer is usually pretty good at cleaning up or "sanitizing" special characters but not always as in this case with the note_label option. Some possible solutions:
try using the word percent instead of the symbol % and see if everything knits
try using backslash percent as in \% which tells Latex treat this as a regular character and not a special character.
oftentimes R / R Markdown prefer two backslashes (one to escape the backslash and one to escape the percent) so worth trying that too (ie, \\%) if neither of the other options work
try manually sanitizing text strings with characters like underscores, percents, ^s, etc with a function like Hmisc::latexTranslate(). See more, here: Function to sanitize strings for LaTeX compilation?
I am trying to write a regex to parse out seven match objects: four numbers and three operands:
Individual lines in the file look like this:
[ 9] -21 - ( 12) - ( -5) + ( -26) = ______
The number in brackets is the line number which will be ignored. I want the four integer values, (including the '-' if it is a negative integer), which in this case are -21, 12, -5 and -26. I also want the operands, which are -, - and +.
I will then take those values (match objects) and actually compute the answer:
-21 - 12 - -5 + -26 = -54
I have this:
[\s+0-9](-?[0-9]+)
In Pythex it grabs the [ 9] but it also then grabs every integer in separate match objects (four additional match objects). I don't know why it does that.
If I add a ? to the end: [\s+0-9](-?[0-9]+)? thinking it will only grab the first integer, it doesn't. I get seventeen matches?
I am trying to say, via the regex: Grab the line number and it's brackets (that part works), then grab the first integer including sign, then the operand, then the next integer including sign, then the next operand, etc.
It appears that I have failed to explain myself clearly.
The file has hundreds of lines. Here is a five line sample:
[ 1] 19 - ( 1) - ( 4) + ( 28) = ______
[ 2] -18 + ( 8) - ( 16) - ( 2) = ______
[ 3] -8 + ( 17) - ( 15) + ( -29) = ______
[ 4] -31 - ( -12) - ( -5) + ( -26) = ______
[ 5] -15 - ( 12) - ( 14) - ( 31) = ______
The operands are only '-' or '+', but any combination of those three may appear in a line. The integers will all be from -99 to 99, but that shouldn't matter if the regex works. The goal (as I see it) is to extract seven match objects: four integers and three operands, then add the numbers
exactly as they appear. The number in brackets is just the line number and plays no role in the computation.
Much luck with regex, if you just need the result:
import re
s="[ 9] -21 - ( 12) - ( -5) + ( -26) = ______"
s = s[s.find("]")+1:s.find("=")] # cut away line nr and = ...
if not re.sub( "[+-0123456789() ]*","",s): # weak attempt to prevent python code injection
print(eval(s))
else:
print("wonky chars inside, only numbers, +, - , space and () allowed.")
Output:
-54
Make sure to read the eval()
and have a look into:
https://opensourcehacker.com/2014/10/29/safe-evaluation-of-math-expressions-in-pure-python/
https://softwareengineering.stackexchange.com/questions/311507/why-are-eval-like-features-considered-evil-in-contrast-to-other-possibly-harmfu/311510
https://www.kevinlondon.com/2015/07/26/dangerous-python-functions.html
Example for hundreds of lines:
import re
s="[ 9] -21 - ( 12) - ( -5) + ( -26) = ______"
def calcIt(line):
s = line[line.find("]")+1:line.find("=")]
if not re.sub( "[+-0123456789() ]*","",s):
return(eval(s))
else:
print(line + " has wonky chars inside, only numbers, +, - , space and () allowed.")
return None
import random
random.seed(42)
pattern = "[ {}] -{} - ( {}) - ( -{}) + ( -{}) = "
for n in range(1000):
nums = [n]
nums.extend([ random.randint(0,100),random.randint(-100,100),random.randint(-100,100),
random.randint(-100,100)])
c = pattern.format(*nums)
print (c, calcIt(c))
Ahh... I had a cup of coffee and sat down in front of Pythex again.
I figured out the correct regex:
[\s+0-9]\s+(-?[0-9]+)\s+([-|+])\s+\(\s+(-?[0-9]+)\)\s+([-|+])\s+\(\s+(-?[0-9]+)\)\s+([-|+])\s+\(\s+(-?[0-9]+)\)
Yields:
-21
-
12
-
-5
+
-26
I have a text variable showing patient prescription that looks quite messy like this:
PatientRx
ACETAZOLAMIDE 250MG TABLET- 100
ADAPALENE + BENZOYL 0.1% + 2.5% GEL-..
ADRENALINE/EPIPEN 300MCG/0.3ML INJ..
ALENDRONATE + COLECA 70MG + 140MCG TA..
ALLOPURINOL 100MG TABLET- 100
ALUM HYDROX + MAG HY 250+120+120MG/5M..
AMILORIDE + HYDROCHL 5MG + 50MG HCL T..
While I haven't looked through all these values, some patterns may arise:
Often times there are more than one drugs and they are separated, for example by space and forward slash.
Drugs are also be separated with plus sign. But plus sign is also used between doses.
The rule related to space is very arbitrary, both at the beginning and in the middle of entry.
How can I extract only the names of the drugs into new variables? New variables should look like this:
Newvar1 Newvar2
ACETAZOLAMIDE
ADAPALENE BENZOYL
ADRENALINE EPIPEN
ALENDRONATE COLECA
and so on.
Some would reach first for regular expressions, which you might indeed need for the full problem. In addition note moss as installed by ssc install moss.
But it seems easiest, given the information in the example here, which is all we have to go on, to look for the position of the first numeric digit 0 to 9 and then parse what goes before. I don't know whether drug names ever contain numeric digits.
clear
input str40 sandbox
" ACETAZOLAMIDE 250MG TABLET- 100"
"ADAPALENE + BENZOYL 0.1% + 2.5% GEL-"
" ADRENALINE/EPIPEN 300MCG/0.3ML INJ"
"ALENDRONATE + COLECA 70MG + 140MCG TA"
" ALLOPURINOL 100MG TABLET- 100"
"ALUM HYDROX + MAG HY 250+120+120MG/5M"
" AMILORIDE + HYDROCHL 5MG + 50MG HCL T"
end
gen wherenum = .
quietly forval j = 0/9 {
replace wherenum = min(wherenum, strpos(sandbox, "`j'")) if strpos(sandbox, "`j'")
}
gen drug = substr(sandbox, 1, wherenum - 1)
split drug, parse(+ /)
l drug?, sep(0)
+---------------------------+
| drug1 drug2 |
|---------------------------|
1. | ACETAZOLAMIDE |
2. | ADAPALENE BENZOYL |
3. | ADRENALINE EPIPEN |
4. | ALENDRONATE COLECA |
5. | ALLOPURINOL |
6. | ALUM HYDROX MAG HY |
7. | AMILORIDE HYDROCHL |
+---------------------------+
I found the following script from asdfree on taxonomies. The current script merges all specialties into a single column asdfree original script. The issue is that the current script ignores the hierarchy of the specialties.
The following code gives you an idea of how there are really multiple levels
library(downloader)
tf <- tempfile()
download("https://raw.githubusercontent.com/ajdamico/asdfree/master/National%20Plan%20and%20Provider%20Enumeration%20System/taxonomy%20id%20table.txt", tf)
z <- readLines(tf)
hmt <- gregexpr("\t", z)
l <- unlist(lapply(hmt, function(x) length(x[x > 0])))
specialty_groups <- pre[l == 1]
specialty_individual <- pre[l == 2]
The issue is that Allegery and Immunology (in first row) is misplaced, and it should really go to the last column.
6 2 Allergy & Immunology 207K00000X Allopathic & Osteopathic Physicians <NA>
7 3 Allergy 207KA0200X Allopathic & Osteopathic Physicians Allergy & Immunology
8 3 Clinical & Laboratory Immunology 207KI0005X Allopathic & Osteopathic Physicians Allergy & Immunology
9 2 Anesthesiology 207L00000X Allopathic & Osteopathic Physicians <NA>
In other words, the data should really look something like this
LEVEL_1 LEVEL_2 LEVEL_3 TAXONOMY
Allopathic & Osteopathic Physicians Allergy & Immunology 207K00000X
Allopathic & Osteopathic Physicians Allergy & Immunology Allergy 207KA0200X
Allopathic & Osteopathic Physicians Allergy & Immunology Clinical & Laboratory Immunology 207KI0005X
How can I achieve this with regex in R?
I am trying to replace Gene1, Gene2, Gene3 and Gene4 by x[1], x[2], x[3] and x[4]. I was able to get one sided bracket but do not know how to add the other one.
######code
install.packages("BoolNet")
library(BoolNet)
n<-generateRandomNKNetwork(4,3,readableFunctions="canonical")
n$interactions$Gene1$expression
func=list()
gfunc=list()
for (i in 1:4){
func[[i]]<-noquote(n$interactions[[paste0("Gene",i)]]$expression)
gfunc[[i]]<-gsub("Gene", "x[", func[[i]])
}
##########################
############output###########
func
[[1]]
[1] (!Gene1 & Gene4 & !Gene3) | (!Gene1 & Gene4 & Gene3) | (Gene1 & !Gene4 & !Gene3) | (Gene1 & Gene4 & Gene3)
[[2]]
[1] (!Gene2 & !Gene3 & !Gene4) | (!Gene2 & !Gene3 & Gene4) | (!Gene2 & Gene3 & !Gene4)
[[3]]
[1] (!Gene2 & !Gene3 & !Gene1) | (!Gene2 & Gene3 & !Gene1) | (!Gene2 & Gene3 & Gene1) | (Gene2 & Gene3 & !Gene1) | (Gene2 & Gene3 & Gene1)
[[4]]
[1] (!Gene3 & Gene2 & !Gene4) | (!Gene3 & Gene2 & Gene4) | (Gene3 & !Gene2 & !Gene4) | (Gene3 & Gene2 & Gene4)
gfunc
[[1]]
[1] (!x[1 & x[4 & !x[3) | (!x[1 & x[4 & x[3) | (x[1 & !x[4 & !x[3) | (x[1 & x[4 & x[3)
[[2]]
[1] (!x[2 & !x[3 & !x[4) | (!x[2 & !x[3 & x[4) | (!x[2 & x[3 & !x[4)
[[3]]
[1] (!x[2 & !x[3 & !x[1) | (!x[2 & x[3 & !x[1) | (!x[2 & x[3 & x[1) | (x[2 & x[3 & !x[1) | (x[2 & x[3 & x[1)
[[4]]
[1] (!x[3 & x[2 & !x[4) | (!x[3 & x[2 & x[4) | (x[3 & !x[2 & !x[4) | (x[3 & x[2 & x[4)
This is what is requested, although I admit I'm not sure what the purpose is:
for (i in 1:4){
func[[i]]<-noquote(n$interactions[[paste0("Gene",i)]]$expression)
gfunc[[i]]<-gsub("(Gene)([[:digit:]])", "x[\\2]", func[[i]])
}
> gfunc
[[1]]
[1] (!x[1] & x[2] & !x[4]) | (x[1] & !x[2] & x[4]) | (x[1] & x[2] & !x[4])
[[2]]
[1] (!x[4] & !x[2] & !x[1]) | (!x[4] & !x[2] & x[1]) | (x[4] & !x[2] & x[1])
[[3]]
[1] (!x[2] & !x[3] & x[4]) | (!x[2] & x[3] & !x[4]) | (x[2] & !x[3] & !x[4]) | (x[2] & !x[3] & x[4])
[[4]]
[1] (!x[2] & !x[3] & x[1]) | (!x[2] & x[3] & x[1]) | (x[2] & x[3] & !x[1])