I have done a lot of analysis and I want to share the output with slides, posters, a written report etc. I could replicate the analysis in each of the reports but I'd like it to be co-ordinated (less error) so that if I make a change in one of the Rmarkdown files (say the parent Rmarkdown) then the chunk I have updated will also be updated in the slides, poster etc. etc.
Is this possible and if so how?
You can use read_chunk() to do this.
Put the code into a file, with a marker at the top of each chunk of code
## ---- myChunk
rnd <- rnorm()
In the Rmd file, load the chunk with
knitr::read_chunk("myCode.R")
and run the chunk with
```{r myChunk}
```
The chunk in the Rmd file should have no contents.
The same chunk can be used in multiple Rmd files.
An even better solution is to use the drake package. drake runs all the code and caches the results (keeping track of when code or data changes require analyses to be rerun). The objects in the cache can be read into the markdown file with readd() or loadd().
Related
Is it possible to concatenate 1000 CSV file that have header into one file with no duplicated header directly in Google Cloud Storage? I could easily do this by downloading the file into my local hard drive but I would prefer to do it natively in Cloud Storage.
They all have same columns, and have header row.
I wrote an article to handle CSV files with BigQuery. To avoid several files, and if the volume is less than 1Gb, the recommended way is the following
Create a temporary table in BigQuery with all your CSV.
Use the Export API (not the export function)
Let me know if you need more guidance.
The problem with most solutions is that you still end up with a large number of split files where you have to then strip the headers and join them, etc...
Any method of avoiding multiple files tends to be also quite a lot of extra work.
It gets to be quite a hassle especially when big query spits out 3500 split gzipped csv files.
I needed a simple and batch file automatable method for achieving this.
Therefore wrote a CSV Merge (Sorry windows only though) to solve exactly this problem.
https://github.com/tcwicks/DataUtilities
Download latest release, unzip and use.
Also wrote an article on with scenario and usage examples:
https://medium.com/#TCWicks/merge-multiple-csv-flat-files-exported-from-bigquery-redshift-etc-d10aa0a36826
Hope it is of use to someone.
p.s. Recommend tab delimited over CSV as it tends to have less data issues.
I have a system consisting of parameters in Access, which are read by an R script, which then starts an Rmarkdown report. In Rmarkdown, a Stata script is built, which reads a data file and creates a graph specified by the Access parameters. To get the Stata graph into the report, I have to store it as a PNG file and link to this file in the Rmarkdown code. Finally, the report is rendered as a Word file (using knitr and Pandoc).
In the present setup, I have several places in the report where a graph can be called for. I can create a single PNG file for each of these places, I know the filenames (controlled by the Access parameters), and I link to each file using the standard command ![](path/to/filename.png. This works properly.
The next development step is that in each place, I need to create an unknown and varying number of PNG files (up to ca. 20 files). I will do this in Stata. The problem is to link to a varying number of files in the Rmd code. I haven't found a way to do this, and need advice on how.
I have some ideas for a solution, but I cannot find the commands or syntax to implement them. I have read the Introduction to Rmarkdown from Rstudio.com, and the Rmarkdown Reference Guide (5 pages) from the same source. I am rather new to both R and Rmarkdown, so I might have overlooked or not understood that there is a solution.
Is it possible to set up a loop or branch (e.g. "if", "for" or "while") in Rmarkdown? Then I could loop over the current number of files, or branch around unused file links.
Can I fetch all files in a certain directory, e.g. by making a link containing wildcards in the filename? Or is there another way of achieving this?
Is there a way of having links to files that do not exist in the present run, without crashing the program? Then I could set up enough links to cover all foreseeable cases.
Or, does anyone have other suggestions?
Sure, you could use a loop like
```{r, results="asis"}
files <- list.files(path = '/path/to/your/pngdirectory/',
pattern = '\\.png', full.names = T)
for(f in files) cat(paste0('![](',f,')\n'))
```
If you want to filter for certain png files you can extend the pattern argument to some more sophisticated regular expression. For exampele, If I only want png files containing '2017-07-11' in their name I would do
list.files(path = '/Users/martin/Dropbox/Screenshots',
pattern = '.*2017-07-11.*\\.png', full.names = T)
where .* matches any character.
I'm planning to use R markdown for reporting health checkup data in company. What I concern about is that if html file produced by r markdown contain the raw data (such as patient name) and accesible via that file in anyway. If that is the case, is there any way to avoid that?
This is difficult to answer without an example of your data (which I realise will be hard to share, given your question).
Basically, there's nothing in the html-version of Rmarkdown that would necessarily embed the data if you don't ask it to. If all you're planning to do is to summarise and ignore (or anonymise) individual cases then Rmarkdown should work fine.
For more information, take a look at the useful help page for html pages using Rmarkdown.
I hope this helps you.
I am using the rmarkdown with the rshiny for generating word file reports. I am using the R studio-server for development. On executing the rshiny application, it halts due to some error in the one of the rmarkdown.
The error says...
Quitting from lines 11-486 (/home/KS127/dev/shiny_apps/pashiny/inst/shiny/dataframe_source.Rmd)
Quitting from lines NA-486 (/home/KS127/dev/shiny_apps/pashiny/inst/shiny/dataframe_source.Rmd)
It's providing the line numbers which are not useful to identify the root cause. Adding print statements are also not useful as I am generating the word file report, until and unless the complete .Rmd doesn't get successfully executed, I won't be able to see print statements output.
I tried changing the rmarkdown output setting from chunk output inline to chunk output to console as mentioned here as well but it is of no use.
Is there any way to print the .Rmd file print statements or the output to the console or is there any way to debug the .Rmd file?
In addition to my comment above, Abhinandan, I've recently stumbled across a new package, called testrmd.
Although it is new, it seems to work with a number of different test packages and provides a useful front-end for Rmarkdown documents. (I'm certainly going to use it.)
You might want to check it out. Here's the link: https://github.com/ropenscilabs/testrmd.
I hope this helps you.
See My .Rmd file becomes very lengthy. Is that possible split it and source() it's smaller portions from main .Rmd?
That's what I do. -
Split your code chunks in separate files and add them one by one
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I'm a great fan of R markdown, finding it even easier than weaving LaTeX for quick project documentation (less than 15 pages). However, I also have to support sometimes other Statistics packages (SPSS, Stata + SAS) and was wondering for equivalent solutions for these.
To some extend this might go back to using some kind of original Noweb code + markdown file to be compiled over the command line. I guess calling the other packages from R is another option.
I have had a look at this example by John Muschelli: http://rpubs.com/muschellij2/3888 and it looks as though he knitted Stata code into an R markdown file.
Can someone provide specific examples of how this can be done in Stata, SAS or SPSS?
I do know of SASweave and StatWeave (the latter is apparently broken???), but think that a markdown solution would be far more advantageous in our case.
Stata has its own SMCL for annotation of logs, the M standing for mark-up. The main reason for a different language is that SMCL has to be created and interpreted line by line in situations where no end of document is in sight, namely within interactive sessions. This is created by Stata automatically as annotation when you ask for it and can be stipulated by users or programmers as a way of tuning Stata's display choices.
The possible connection to your question is that SMCL can be translated to HTML, which opens various doors. So, something that is easy in Stata is to do some work, keep a log file in SMCL and then translate the log file to HTML. You would not get anything really nice without further work, but the further work is easy and amounts to doing what you would done any way, but in your favourite text editor or text processor, rather than within Stata.
This is made easier by log2html which Stata users can install using ssc inst log2html. It exploits a feature undocumented in Stata.
Stata's help files can also be translated to HTML in the same way (but consider copyright issues if doing this with official help files; it's fair play with your own help files).
John Muschelli pointed me to this Stata program:
https://github.com/amarder/stata-tutorial/blob/master/knitr.do
It parses a .domd file which contains markdown and Stata code and produces a .md file with executed Stata code. The name of the file to be parsed is at the end of the knitr.do file.
More specifcally:
Download the knitr.do file from https://github.com/amarder/stata-tutorial/blob/master/knitr.do
Download the clustered-standard-errors.domd file from https://github.com/amarder/stata-tutorial/blob/master/clustered-standard-errors.domd
Save them both in some directory.
Modify the last line of knitr.do to reflect the complete path of its directory (e.g.
D:\Desktop\knit_example\clustered-standard-errors.domd
Run knitr.do to get your markdown (.md) file (and an intermediate .md1 file).
Note that knitr.do contains the programs that do the work and a line (the last one):
knit "whatever-file.domd"
that calls the program.
So you basically write a .domd file [that of step (2) is only an example] containing Markdown syntax and Stata commands, run knitr.do adjusting the file name, and get a Markdown file with executed Stata commands.
There are several caveats:
Only one-liner Stata commands are allowed. A loop, for example, won't work.
".domd" can't be part of the file name.
If there is an error with a Stata command, the user gets no return code.
File handles need to be manually closed if user hits the Break button when the program is running or if there is a Stata command error.
I'm not sure if this is what you want, but if you're looking to create .html files in SAS that contain statistical reports within them, then you can use the Output Delivery System (ODS).
Example syntax is:
ods html file='pathofdirectory\filename.html' <additional options>;
proc print... (SAS code that generates output)
proc means...
proc freq...
proc gchart...
proc gplot...
...
ods html close;
SPSS (and SAS I presume) have some overhead by the need to write everything to disk that makes the compilation in one fell swoop less appealing. Similar as to what Yick mentioned, SPSS has an output system that one can write automated reports to begin with and export to HTML or PDF or Word. It isn't the easiest thing to make look nice, but it is possible and additions to ease automated editing (mainly via Python scripts) are being rolled out on a regular basis.
Basically the automated reports I write now using SPSS and R have html shells. The code then just updates or inserts the needed tables and graphs. They are entirely self-contained, reproducible, and run on weekly or monthly timers without human intervention. They just don't have inline code blocks exactly defining how the tables are produced (you would have to trace the code slightly further back to figure it out - but that isn't too onerous IMO).
Because SPSS allows you to run SPSS code from the Python command prompt you could theoretically knit a document with Python code calling SPSS. I'm not quite sure I see the advantage of this over having more segmented code in seperate places though. Do you really want to read 100 lines of SPSS code that begins with an SQL query, does some transformations and produces a table and a graph? Wouldn't you rather see the table and graph, and then if interested in the nitty gritty go back to see DataPrep.sps that prepares all of the data, then see Table1.sps and Figure1.sps etc. to see how they each were exactly produced?