I am currently writing a manuscript using Bookdown and the GitBook format.
My first and last chapters (Introduction and Conclusion) are unnumbered, whereas the 2nd and 3rd chapters are numbered.
However, when rendering the book and navigating throughout the different pages, the unnumbered chapters are never displayed as “active” in the Table of Contents (left-side bar).
Instead, the 1st numbered chapter (named "Chapter1" in the following code) is displayed as active (highlighted) when the current page is either the "Introduction" or "Conclusion".
When navigating to any numbered chapter ("Chapter1" or "Chapter2"), they are correctly displayed as active, as expected.
This can be verified by inspecting the elements in the TOC: the numbered chapters have the chapter class, and the 1st numbered chapter has both chapter and active when reading the unnumbered chapters. Unnumbered chapters do not have the chapter nor active classes.
I have tried adding the "chapter" class by using {- .chapter}, to no effect. Making all chapters numbered effectively makes the correct one displayed as active, but is not optimal (since I would like to use unnumbered chapters).
How can I make Bookdown correctly display the correct chapter as active, without making all of them numbered?
Here is the minimal example, the result is the same whether I put all code in the index.Rmd or in separated index.Rmd, 01-chap1.Rmd, 02-chap2.Rmd, ..., files.
Also, we can notice that using {-} or {.unnumbered} results in the same behaviour.
index.Rmd
---
title: "A Minimal Book Example"
author: "John Doe"
date: "`r Sys.Date()`"
site: bookdown::bookdown_site
output:
bookdown::gitbook: default
---
# Introduction {-}
Lorem ipsum
# Chapter1
## Section 1
# Chapter 2
## Section 1
# Conclusion {.unnumbered}
I also include an image to demonstrate the problem: we can clearly see that the current page is the "Introduction" chapter, however the highlighted ("active") chapter is "Chapter1".
Thank you in advance!
Try using the option to split by chapter.
In my experience it works as long as the chapters are not on the same HTML page.
I would like to remove the DOI from the bibliographic references in my markdown script. Is there a way I can do this?
Here is my markdown file:
---
title: "my paper"
author: "name"
date: \today
header-includes:
output:
pdf_document:
number_sections: yes
toc: yes
keep_tex: yes
fig_caption: yes
word_document:
toc: yes
latex_engine: xelatex
indent: yes
bibliography: library.bib
references:
link-citations: yes
linkcolor: blue
hyperfootnotes: yes
---
I would like to remove the DOI from this reference #Wallace2005
# Bibliography {-}
::: {#refs}
:::
The output of this file is the following:
And here is the .bib file
#article{Wallace2005,
abstract = {Constantly evolving, and with far-reaching implications, European Union policy-making is of central importance to the politics of the European Union. From defining the processes, institutions and modes through which policy-making operates, the text moves on to situate individual policieswithin these modes, detail their content, and analyse how they are implemented, navigating policy in all its complexities. The first part of the text examines processes, institutions, and the theoretical and analytical underpinnings of policy-making, while the second part considers a wide range of policy areas, from economics to the environment, and security to the single market. Throughout the text, theoreticalapproaches sit side by side with the reality of key events in the EU, including enlargement, the ratification of the Lisbon Treaty, and the financial crisis and resulting euro area crisis, exploring what determines how policies are made and implemented. In the final part, the editors consider trendsin EU policy-making and look at the challenges facing the EU. Exploring the link between the modes and mechanisms of EU policy-making and its implementation at national level, Policy-Making in the Europe Union helps students to engage with the key issues related to policy. Written by experts, for students and scholars alike, this is the most authoritative andin-depth guide to policy in the European Union.},
author = {Wallace, Helen and Wallace, William and Pollack, Mark A},
doi = {10.1177/0010414013516917},
file = {:Users/aguasti/Desktop/Mendely Organized Library/Wallace, Wallace, Pollack/Wallace, Wallace, Pollack - 2005 - Policy-Making in the European Union.pdf:pdf},
isbn = {0199689679},
issn = {0010-4140},
pages = {574},
pmid = {130137987},
title = {{Policy-Making in the European Union}},
url = {https://books.google.com/books?id=w6SbBQAAQBAJ&pgis=1},
year = {2005}
}
If anyone knows how I could remove the DOI from the bibliographic reference I would be extremely grateful
I am assuming that you want to have this done on the fly while knitting the PDF.
The way the references are rendered is controlled by the applied citation styles.
So, one way would be to change the citation style and in the YAML header to a style that does not include the DOI (note that for the PDF output you would need to add the natbib line).
bibliography: library.bib
citation_package: natbib
csl: somethingelse.csl
Alternatively, if you have to stick to a certain style, you could [modify the CSL-file] (https://www.zotero.org/support/dev/citation_styles/style_editing_step-by-step).
Example for elsevier-harvard.csl
You could just comment the relevant line in the CSL-file:
<if variable="DOI">
<!--<text variable="DOI" prefix="https://doi.org/"/> -->
</if>
Save this under a new name (e.g., elsevier-harvard_mod.csl)
and then re-run your example (here shortened)
---
title: "my paper"
author: "name"
date: \today
output: pdf_document
bibliography: library.bib
citation_package: natbib
csl: elsevier-harvard_mod.csl
---
I would like to remove the DOI from this reference #Wallace2005
# Bibliography {-}
As the title says, I use Rmarkdown to write a document.
I use the following text at the top of the .Rmd document:
---
title: "Title"
author: "Me"
date: "September 10, 2018"
output:
pdf_document: default
html_document: default
bibliography: bibliography.bib
---
And then I use the following code in my bibliography.bib document, which, according to the document properties is a bibtex file:
#article{Brooks98,
author={ Brooks, S. P. and Gelman, A.},
title={Interface foundation of america general methods for monitoring convergence of iterative simulations general methods for monitoring convergence of iterative simulations},
year={1998},
journal={Journal of Computational and Graphical Statistics},
volume=7,
issue=4,
pages=434-455
}
I expect to get
Brooks, S. P. and Gelman, A. 1998
but instead I get
Brooks, S. P., and A. Gelman. 1998
My question is, what causes this and how do I solve the problem?
You have to change your citation style. One simple solution would be to use bibtex together with natbib and apalike:
---
title: "Title"
author: "Me"
date: "September 10, 2018"
output:
pdf_document:
citation_package: natbib
html_document: default
bibliography: bibliography.bib
biblio-style: apalike
---
(Note that you will have to use pages={434-455} for this to work.)
If there are other aspects of the citation style that do not fit, you can have a look at this answer for ways of finding other styles. Another alternative would be biblatex.
The default, which I am less familiar with, is to use pandoc-citeproc, which uses CSL files to define the style. See here for resources about additional CSL styles: https://citationstyles.org/
Here I learn how to insert citations in the middle of a text and generate a full bibliography at the end of the document. I wonder whether it is possible to have an output like this using citation keys:
Bla bla bla.
Watson, J. D., & Crick, F. H. (1953). Molecular structure of nucleic acids. Nature, 171(4356), 737-738.
Bla bla bla.
In-text full references/citations in RMarkdown using Bibtex-package.
Solution using bibtex-package proposed by Samuel-Rosa.
An example with citations for packages:
Example .bib file
knitr::write_bib("R"), "example.bib")
Read your .bib file into R.
refs <- bibtex::read.bib("example.bib")
Using in-line chunks, select the entry of interest, e.g. R-base, print the entry, and "capture" it as character for RMarkdown output.
Example full reference for R:
> `r capture.output(print(refs["R-base"]))`
Output:
Example full reference for R:
R Core Team (2021). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. <URL: https://www.R-project.org/>.
As a relative novice in R and programming, my first ever question in this forum is about regex pattern matching, specifically line breaks. First some background. I am trying to perform some preprocessing on a corpus of texts using R before processing them further on the NLP platform GATE. I convert the original pdf files to text as follows (the text files, unfortunately, go into the same folder):
dest <- "./MyFolderWithPDFfiles"
myfiles <- list.files(path = dest, pattern = "pdf", full.names = TRUE)
lapply(myfiles, function(i) system(paste('"C:/Program Files (x86)/xpdfbin-win-3.04/bin64/pdftotext.exe"', paste0('"', i, '"')), wait = FALSE))
Then, having loaded the tm package and physically(!) moved the text files to another folder, I create a corpus:
TextFiles <- "./MyFolderWithTXTfiles"
EU <- Corpus(DirSource(TextFiles))
I then want to perform a series of custom transformations to clean the texts. I succeeded to replace a simple string as follows:
ReplaceText <- content_transformer(function(x, from, to) gsub(from, to, x, perl=T))
EU2 <- tm_map(EU, ReplaceText, "Table of contents", "TOC")
However, a pattern that is a 1-3 digit page number followed by two line breaks and a page break is causing me problems. I want to replace it with a blank space:
EU2 <- tm_map(EU, ReplaceText, "[0-9]{1,3}\n\n\f", " ")
The ([0-9]{1,3}) and \f alone match. The line breaks don't. If I copy text from one of the original .txt files into the RegExr online tool and test the expression "[0-9]{1,3}\n\n\f", it matches. So the line breaks do exist in the original .txt file.
But when I view one of the .txt files as read into the EU corpus in R, there appear to be no line breaks even though the lines are obviously breaking before the margin, e.g.
[3] "PROGRESS TOWARDS ACCESSION"
[4] "1"
[5] ""
[6] "\fTable of contents"
Seeing this, I tried other patterns, e.g. to detect one or more blank space ("[0-9]{1,3}\s*\f"), but no patterns worked.
So my questions are:
Am I converting and reading the files into R correctly? If so, what has happened to the line breaks?
If no line breaks is normal, how can I pattern match the character on line 5? Is that not a blank
space?
(A tangential concern:) When converting the pdf files, is there code that will put them directly in a new folder?
Apologies for extending this, but how can one print or inspect only a few lines of the text object? The tm commands and head(EU) print the entire object, each a very long text.
I know my problem(s) must appear simple and perhaps stupid, but one has to start somewhere and extensive searching has not revealed a source that explains comprehensively how to use RegExes to modify text objects in R. I am so frustrated and hope someone here will take pity and can help me.
Thanks for any advice you can offer.
Brigitte
p.s. I think it's not possible to upload attachments in this forum, therefore, here is a link to one of the original PDF documents: http://ec.europa.eu/enlargement/archives/pdf/key_documents/1998/czech_en.pdf
Because the doc is long, I created a snippet of the first 3 pages of the TXT doc, read it into the R corpus ('EU') and printed it to the console and this is it:
dput(EU[[2]])
structure(list(content = c("REGULAR REPORT", "FROM THE COMMISSION ON",
"CZECH REPUBLIC'S", "PROGRESS TOWARDS ACCESSION ***********************",
"1", "", "\fTable of contents", "A. Introduction", "a) Preface The Context of the Progress Report",
"b) Relations between the European Union and the Czech Republic The enhanced Pre-Accession Strategy Recent developments in bilateral relations",
"B. Criteria for membership", "1. Political criteria", "1.1. Democracy and the Rule of Law Parliament The Executive The judicial system Anti-Corruption measures",
"1.2. Human Rights and the Protection of Minorities Civil and Political Rights Economic, Social and Cultural Rights Minority Rights and the Protection of Minorities",
"1.3. General evaluation", "2. Economic criteria", "2.1. Introduction 2.2. Economic developments since the Commission published its Opinion",
"Macroeconomic developments Structural reforms 2.3. Assessment in terms of the Copenhagen criteria The existence of a functioning market economy The capacity to cope with competitive pressure and market forces 2.4. General evaluation",
"3. Ability to assume the obligations of Membership", "3.1. Internal Market without frontiers General framework The Four Freedoms Competition",
"3.2. Innovation Information Society Education, Training and Youth Research and Technological Development Telecommunications Audio-visual",
"3.3. Economic and Fiscal Affairs Economic and Monetary Union",
"2", "", "\fTaxation Statistics "), meta = structure(list(author = character(0),
datetimestamp = structure(list(sec = 50.1142621040344, min = 33L,
hour = 15L, mday = 3L, mon = 10L, year = 114L, wday = 1L,
yday = 306L, isdst = 0L), .Names = c("sec", "min", "hour",
"mday", "mon", "year", "wday", "yday", "isdst"), class = c("POSIXlt",
"POSIXt"), tzone = "GMT"), description = character(0), heading = character(0),
id = "CZ1998ProgressSnippet.txt", language = "en", origin = character(0)), .Names = c("author",
"datetimestamp", "description", "heading", "id", "language",
"origin"), class = "TextDocumentMeta")), .Names = c("content",
"meta"), class = c("PlainTextDocument", "TextDocument"))
Yes, working with text in R is not always a smooth experience! But you can get a lot done quickly with some effort (maybe too much effort!)
If you could share one of your PDF files or the output of dput(EU), that might help to identify exactly how to capture your page numbers with regex. That would also add a reproducible example to your question, which is an important thing to have in questions here so that people can test their answers and make sure they work for your specific problem.
No need to put PDF and text files in separate folders, instead you can use a pattern like so:
EU <- Corpus(DirSource(pattern = ".txt"))
This will only read the text files and ignore the PDF files
There is no 'snippet view' method in tm, which is annoying. I often use just names(EU) and EU[[1]] for quick looks
UPDATE
With the data you've just added, I'd suggest a slightly tangential approach. Do the regex work before passing the data to the tm package formats, like so:
# get the PDF
download.file("http://ec.europa.eu/enlargement/archives/pdf/key_documents/1998/czech_en.pdf", "my_pdf.pdf", method = "wget")
# get the file name of the PDF
myfiles <- list.files(path = getwd(), pattern = "pdf", full.names = TRUE)
# convert to text (not my pdftotext is in a different location to you)
lapply(myfiles, function(i) system(paste('"C:/Program Files/xpdf/bin64/pdftotext.exe"', paste0('"', i, '"')), wait = FALSE))
# read plain text int R
x1 <- readLines("my_pdf.txt")
# make into a single string
x2 <- paste(x1, collapse = " ")
# do some regex...
x3 <- gsub("Table of contents", "TOC", x2)
x4 <- gsub("[0-9]{1,3} \f", "", x3)
# convert to corpus for text mining operations
x5 <- Corpus(VectorSource(x4))
With the snippet of data your provided using dput, the output from this method is
inspect(x5)
<<VCorpus (documents: 1, metadata (corpus/indexed): 0/0)>>
[[1]]
<<PlainTextDocument (metadata: 7)>>
REGULAR REPORT FROM THE COMMISSION ON CZECH REPUBLIC'S PROGRESS TOWARDS ACCESSION *********************** TOC A. Introduction a) Preface The Context of the Progress Report b) Relations between the European Union and the Czech Republic The enhanced Pre-Accession Strategy Recent developments in bilateral relations B. Criteria for membership 1. Political criteria 1.1. Democracy and the Rule of Law Parliament The Executive The judicial system Anti-Corruption measures 1.2. Human Rights and the Protection of Minorities Civil and Political Rights Economic, Social and Cultural Rights Minority Rights and the Protection of Minorities 1.3. General evaluation 2. Economic criteria 2.1. Introduction 2.2. Economic developments since the Commission published its Opinion Macroeconomic developments Structural reforms 2.3. Assessment in terms of the Copenhagen criteria The existence of a functioning market economy The capacity to cope with competitive pressure and market forces 2.4. General evaluation 3. Ability to assume the obligations of Membership 3.1. Internal Market without frontiers General framework The Four Freedoms Competition 3.2. Innovation Information Society Education, Training and Youth Research and Technological Development Telecommunications Audio-visual 3.3. Economic and Fiscal Affairs Economic and Monetary Union Taxation Statistics