How to insert a BibTeX reference into an R code chunk? - r-markdown

I'd like to write something to generate a figure based on data for a few hundred references that I have in a BibTeX file.
I will be discussing some particular references in the body of the document.
I'd like to be able to make a plot using an R code chunk that inserts the numeric reference into the plot itself and keeps it consistent with the numeric citations that I have in the rest of the document.
I can insert a bibliographic reference into the body of the text using: [#citation_name] (which is then converted to the numeric citation).
Unfortunately, when I try to insert a bibliographic reference into a code chunk, like in the following, it just prints out "[#citation_name]".
```{r}
plot(
c(1,1)
)
text(1.1, "[#citation_name]")
```
Is there a good way to do this? Thanks!

Related

In Libreoffice Calc, which formula will check if a a keyword or part of it is contained in a cell in a row and copy the entire content of that cell?

I am learning how to use formulas in spreadsheets, I do use libre office.
I need to sort out data in a quite huge messy spreadsheet.
Each column contains mixed data, the sheet is huge, dozens of columns and thousands of rows, if the spreadsheet does not contain errors each cell in a row either contains a different keyword or is empty, there should not be two cells in the same row containing the same keyword.
The problem to solve is to sort out all the data so to reach to have a new spreadsheet in which each cell marked with a given specific keyword is kept in the same position but placed in one column dedicated to that same keyword.
the kind of spreadsheet with mixed up cells to be sorted out
the data in the spreadsheet has to be fixed so to appear in this way
A formula that can be used to extract sorted out data from a cell is the following:
=IF(SEARCH("Text1";B2;1);B2;0)
The formula can be dragged to each cell below to hit the proper cell next to it. The result is correct.
The results are correct, but I do not know why the expected 0 is not printed, there is #VALUE! instead
The logic is very simple, if the cell contains the keyword or any other text that contains that keyword the result is the full content of that cell, otherwise the result is 0.
Here comes the first question, why do I get #VALUE! as a result for those cells that do not contain the keyword? I expected to get 0 instead, just as indicated in the formula,
I tried to leave this filed empty and also to put the 0 result in quotes, the actual result is always the same, #VALUE!...
However, of course this formula extracts only the information contained in one column, so for each other column the process must be repeated.
In order to avoid to create a column with the formula for each column in the spreadsheet or anyway to process each column one by one and more importantly to have then to merge all the results to form one columns containing only cells with a given keyword I thought to use the same formula extending the parsing to each next cell in the row as follows:
=IF(SEARCH("text";B2;1);B2;IF(SEARCH("text";C2;1);C2;IF(SEARCH("text";D2;1);D2;0)))
The logic is very simple and should output in one go a column containing all the cells containing the keyword that are found in the row, check if the first cell in the row contains a word using the search function, if does then the result is the content of that cell, otherwise perform the next test, the next test is the same, check if the next cell contains a certain word using the search function, if does then the result is the content of that cell, otherwise proceed to the next test…. and so on until last test, if no test gave a true result then print 0 (but we get #VALUE!, OK I could live with that...).
In theory should work for a any number of cells, but in the practice does not at all, in fact does work only for the first IF test and cell indicated in the formula.
WHY?
The result using the extended version of the formula to parse N cells in sequence is the same obtained with the simple formula to parse only one cell
Finally, how do I resolve this problem using IF and Search?
Is there any other better approach and way to solve this kind of problems and sort out data in huge spreadsheets of this kind?
Thank you for any hint and help.

Default code folding by individual chunk in rmarkdown

I am writing up a lesson in HTML using rmarkdown to demonstrate how to implement analytic methods in R, and because of this the document has a lot of code that is needed to understand those methods, but also a lot of code that is used only for generating plots and figures. I would like to show the first sort of code by default, and leave the plotting code available for students to view but hidden by default.
I know that rmarkdown has recently added support for code folding by setting the code_folding html_document argument to either show or hide. However, this either leaves all code chunks unfolded or folded by default -- is there any way to indicate whether individual code chunks should be shown or folded by default while allowing code folding?
Thank you!
I arrived here wondering the same thing. This is a not a perfect solution, but I write the code twice: once in regular markdown (so it displays - note no {r} after the three backticks), and another time in a code chunk (so it runs).
Example:
This runs but doesn't display the actual code
```{r}
5 * 5
```
This results in both the code and execution being displayed
```
5 * 5
```
```{r}
5 * 5
```
Which results in:
David Fong provided a perfect solution for this in their answer: https://stackoverflow.com/a/56657730/9727624
To override the state, use {r class.source = "fold-hide"} if the yaml setting is show, and {r class.source = "fold-show"} if the setting is hide.

Openpyxl: Formulas getting removed when saving file

im using openpyxl to edit an excel file that contains some formulas in certain cells. Now when i populate the cells from a text file, im expecting the formula to work and give me my desired output. But what i observe is that the formulas get removed and the cells are left blank.
I had the same problem when saving the file with openpyxl: formulas removed.
But I pointed out that some intermediate formulas were still there.
After some tests, it appears that, in my case, all formulas which are displaying blank result (nothing) are cleaned when the save occured, unlike the formulas with an output in the cell, which are preserved.
ex :
=IF((SUM(P3:P5))=0;"";(SUM(Q3:Q5))/(SUM(P3:P5))) => can be removed when saving because of the blank result
ex :
=IF((SUM(P3:P5))=0;"?";(SUM(Q3:Q5))/(SUM(P3:P5))) => preserved when saving
for my example I'm using openpyxl-2.0.3 on Windows. Open and save function calls are :
self._book = load_workbook("myfile.xlsx", data_only=False)
self._book.save("myfile.xlsx")
openpyxl does currently not support reading of formulas. Ie. If you read your file and write it back, all formulas are removed. There is an active feature request in bitbucket tough.

Library for data storage and analysis

So, I have this program that collects a bunch of interesting data. I want to have a library that I can use to sort this data into columns and rows (or similar), save it to a file, and then use some other program (like OpenOffice Spreadsheet, or MATLAB since I own it, or maybe some other spreadsheet/database grapher that I don't know of) to analyse and graph the data however I want. I prefer this library to be open source, but it's not really a requirement.
Ok so my mistake, you wanted a writer. Writing a CSV is simple and apparently reading them into matlab is simple too.
http://www.mathworks.com.au/help/techdoc/ref/csvread.html
A CSV has a simple structure. For each row you seperate by newline. and each column is seperated by a comma.
0,10,15,12
4,7,0,3
So all you really need to do is grab your data, seperate it by rows then write a line out with each column seperated by a comma.
If you need a code example I can edit again but this shouldn't be too difficult.

Search a list of terms from this website, and nostop even any one of the terms are missing

I am trying to use RCurl package to get data from the genecard databases
http://www-bimas.cit.nih.gov/cards//
I read a wonderful solution in a previous posted questions:
How can I use R (Rcurl/XML packages ?!) to scrape this webpage?
However, my problem is different in a form that I need further supports from experist. Instead of exctracting all the links from the webpages. I have a list of ~ 1000 genes in my mind. They are in the form of gene symbols (some of the gene symbols can be found in the webpage, some of them are new to the database). Here is part of my lists of genes.
TP53
SOD1
EGFR
C2d
AKT2
NFKB1
C2d is not in the database, so, when I do the search manually I will see.
"Sorry, there is no GeneCard for C2d".
When I use to the solution posted in the previous questions for my analysis.
How can I use R (Rcurl/XML packages ?!) to scrape this webpage?
(1) I firstly readin the list
(2) I then use the get_structs function in the previous solution to subsitute each gene sybmols in the list to the following website
http://www-bimas.cit.nih.gov/cgi-bin/cards/carddisp.pl?gene=genesybol.
(3) Scrap the information that I needed for each genes in the list, using the get_data_url function in the previous message.
It works for the TP53, SOD1, EGFR, but when the search comes to C2d. The process stopped.
As I got ~ 1000 genes, I am sure some of them are missing from the webpage.
How can I get a modified gene list to tell me out of ~1000 genes, which one of them are missing automatically? So, that I can use the same approach as listed in the previous question to get all the data that I needed based on the new gene lists that are EXISTING in webpage?
Or are there any methods to ask the R to skip those missing items and do the scrapping continuously till the end of the list but mark those missing items in the final results.
In order to faciliate the discussion process. I have make a sudo input files using the scripts using in the previous questions for the same webpage that they used.
u <- c ("Aero_pern", "Ppate", "didnotexist", "Sbico")
library(RCurl)
base_url<-"http://gtrnadb.ucsc.edu/" base_html<-getURLContent(base_url)[[1]]
links<-strsplit(base_html,"a href=")[[1]]
get_structs<-function(u) {
struct_url<-paste(base_url,u,"/",u,"-structs.html",sep="")
raw_data<-getURLContent(struct_url)
s_split1<-strsplit(raw_data,"<PRE>")[[1]]
all_data<-s_split1[seq(3,length(s_split1))]
data_list<-lapply(all_data,parse_genomes)
for (d in 1:length(data_list)) {data_list[[d]]<-append(data_list[[d]],u)}
return(data_list)
}
I guess the problem can be solved by modifing the get_structs scripps above or ifelse function may help, but I cannot figure out how to modify it further. Pls comments.
You can enclose your function call inside a try() so that the process won't break if you get errors. Usually this will let you loop over problematic cases and it will return an error message instead of breaking your process. e.g.
dat <- list()
for (i in 1:length(u)){
dat[[i]] <- try(get_structs(u[i]))
}