Markdown in other statistics packages than R [closed]

Markdown in other statistics packages than R [closed] - sas

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I'm a great fan of R markdown, finding it even easier than weaving LaTeX for quick project documentation (less than 15 pages). However, I also have to support sometimes other Statistics packages (SPSS, Stata + SAS) and was wondering for equivalent solutions for these.
To some extend this might go back to using some kind of original Noweb code + markdown file to be compiled over the command line. I guess calling the other packages from R is another option.
I have had a look at this example by John Muschelli: http://rpubs.com/muschellij2/3888 and it looks as though he knitted Stata code into an R markdown file.
Can someone provide specific examples of how this can be done in Stata, SAS or SPSS?
I do know of SASweave and StatWeave (the latter is apparently broken???), but think that a markdown solution would be far more advantageous in our case.

Stata has its own SMCL for annotation of logs, the M standing for mark-up. The main reason for a different language is that SMCL has to be created and interpreted line by line in situations where no end of document is in sight, namely within interactive sessions. This is created by Stata automatically as annotation when you ask for it and can be stipulated by users or programmers as a way of tuning Stata's display choices.
The possible connection to your question is that SMCL can be translated to HTML, which opens various doors. So, something that is easy in Stata is to do some work, keep a log file in SMCL and then translate the log file to HTML. You would not get anything really nice without further work, but the further work is easy and amounts to doing what you would done any way, but in your favourite text editor or text processor, rather than within Stata.
This is made easier by log2html which Stata users can install using ssc inst log2html. It exploits a feature undocumented in Stata.
Stata's help files can also be translated to HTML in the same way (but consider copyright issues if doing this with official help files; it's fair play with your own help files).

John Muschelli pointed me to this Stata program:
https://github.com/amarder/stata-tutorial/blob/master/knitr.do
It parses a .domd file which contains markdown and Stata code and produces a .md file with executed Stata code. The name of the file to be parsed is at the end of the knitr.do file.
More specifcally:
Download the knitr.do file from https://github.com/amarder/stata-tutorial/blob/master/knitr.do
Download the clustered-standard-errors.domd file from https://github.com/amarder/stata-tutorial/blob/master/clustered-standard-errors.domd
Save them both in some directory.
Modify the last line of knitr.do to reflect the complete path of its directory (e.g.
D:\Desktop\knit_example\clustered-standard-errors.domd
Run knitr.do to get your markdown (.md) file (and an intermediate .md1 file).
Note that knitr.do contains the programs that do the work and a line (the last one):
knit "whatever-file.domd"
that calls the program.
So you basically write a .domd file [that of step (2) is only an example] containing Markdown syntax and Stata commands, run knitr.do adjusting the file name, and get a Markdown file with executed Stata commands.
There are several caveats:
Only one-liner Stata commands are allowed. A loop, for example, won't work.
".domd" can't be part of the file name.
If there is an error with a Stata command, the user gets no return code.
File handles need to be manually closed if user hits the Break button when the program is running or if there is a Stata command error.

I'm not sure if this is what you want, but if you're looking to create .html files in SAS that contain statistical reports within them, then you can use the Output Delivery System (ODS).
Example syntax is:
ods html file='pathofdirectory\filename.html' <additional options>;
proc print... (SAS code that generates output)
proc means...
proc freq...
proc gchart...
proc gplot...
...
ods html close;

SPSS (and SAS I presume) have some overhead by the need to write everything to disk that makes the compilation in one fell swoop less appealing. Similar as to what Yick mentioned, SPSS has an output system that one can write automated reports to begin with and export to HTML or PDF or Word. It isn't the easiest thing to make look nice, but it is possible and additions to ease automated editing (mainly via Python scripts) are being rolled out on a regular basis.
Basically the automated reports I write now using SPSS and R have html shells. The code then just updates or inserts the needed tables and graphs. They are entirely self-contained, reproducible, and run on weekly or monthly timers without human intervention. They just don't have inline code blocks exactly defining how the tables are produced (you would have to trace the code slightly further back to figure it out - but that isn't too onerous IMO).
Because SPSS allows you to run SPSS code from the Python command prompt you could theoretically knit a document with Python code calling SPSS. I'm not quite sure I see the advantage of this over having more segmented code in seperate places though. Do you really want to read 100 lines of SPSS code that begins with an SQL query, does some transformations and produces a table and a graph? Wouldn't you rather see the table and graph, and then if interested in the nitty gritty go back to see DataPrep.sps that prepares all of the data, then see Table1.sps and Figure1.sps etc. to see how they each were exactly produced?

Related

Pentaho reports localization into not supported languages

I am from Slovakia, I wouldn't be surprised if most of you haven't heard about it.
However, that causes me a troubles when it comes to reports. We need to have 3 (soon 4) language versions of each report: Slovak is main language, than, Polish and English.
Since pentaho does not support Polish nor Slovak, it is really pain for me to keep these localized.
What I do is:
Create report in Slovak language
Write down all phrases from report
Send phrases to one of our partners to translate
Create its copy in either pl/en directory
Open it in Report Designer and edit every phrase accordingly
Save as another language version
As you can imagine, the process is very time consuming, and error prone. Plus, every time I add new parameter to report or change its data source (which is BeanShell script), I need to do it in 3 separated files. As a result of this, language mutations are usually out of date, way behind main language version.
I have tried to automate it with OneSky and did a python script that does 2 stages:
Stage 1 (extract and upload):
Change *.prpt files sufix to *.zip
Extract phrases from files: ~/datadefinition.xml, ~/layout.xml, ~/styles.xml, ~/datasources/inline-ds.xml
Put those phrases into *.po file
Export *.po file into OneSky
Stage 2 (download and import):
Change *.prpt files sufix to *.zip
Download translated *.po file from OneSky
Run through ~/datadefinition.xml, ~/layout.xml, ~/styles.xml, ~/datasources/inline-ds.xml files and replace original phrases by translated
While this aproach works fine, it doe not translate everything. There are still flaws of this process. I need to go through it every time I do even slightest change in data source of report or fix small mistakes. Even if I just do a small six in SQL code, I need to do it in 3 files. That of course increases chance to mistake be made.
Soo, I was wondering, how are you guys solving this issue with translating of your reports?

I will share very simple method which we are following.
1)create a properties file with key value format for each language for resource labels(for static values)
2)put it into resources folder(report-designer/resources/)
3)Based on the parameter you can specify which properties file to select and you can specify keys into value field so that it can understand which value to display in which language.
4)if you need to convert the data which are coming from database,you have to design data warehouse specify all the mappings,accordingly it can fetch the data.
5)For converting dates and currency symbols or number format you can use inbuilt functions which can handle all this things,i am using mysql and mysql has translation functions which can handle all such things.
it is difficult to explain entire process here, but if you can get and idea from this it can be useful to you.

debugging options with rmarkdown

I am using the rmarkdown with the rshiny for generating word file reports. I am using the R studio-server for development. On executing the rshiny application, it halts due to some error in the one of the rmarkdown.
The error says...
Quitting from lines 11-486 (/home/KS127/dev/shiny_apps/pashiny/inst/shiny/dataframe_source.Rmd)
Quitting from lines NA-486 (/home/KS127/dev/shiny_apps/pashiny/inst/shiny/dataframe_source.Rmd)
It's providing the line numbers which are not useful to identify the root cause. Adding print statements are also not useful as I am generating the word file report, until and unless the complete .Rmd doesn't get successfully executed, I won't be able to see print statements output.
I tried changing the rmarkdown output setting from chunk output inline to chunk output to console as mentioned here as well but it is of no use.
Is there any way to print the .Rmd file print statements or the output to the console or is there any way to debug the .Rmd file?

In addition to my comment above, Abhinandan, I've recently stumbled across a new package, called testrmd.
Although it is new, it seems to work with a number of different test packages and provides a useful front-end for Rmarkdown documents. (I'm certainly going to use it.)
You might want to check it out. Here's the link: https://github.com/ropenscilabs/testrmd.
I hope this helps you.

See My .Rmd file becomes very lengthy. Is that possible split it and source() it's smaller portions from main .Rmd?
That's what I do. -
Split your code chunks in separate files and add them one by one

Is it possible to get a file modification time in Stata?

Suppose I have a database with file names and I would like add file modification dates and times to this database. Is it possible to do it in Stata in a straightforward way?
I can think of two non-straightforward ways:
1) Writing a plugin in C or Java.
2) Using dir command, capturing the output in a log file, and then importing that log file back.
But is there a less cumbersome solution?
There does not seem to be either a Stata or a Mata function that is of any help. I realize that I can easily do it in any scripting language and then import the results into Stata but I would like to know if there is a purely Stata solution (for portability reasons).

I think you can do that using the shell capabilities of Stata.
See here:
http://www.stata.com/help.cgi?shell

In SAS, what are good techniques/options for catching syntax errors?

In the enhanced editor, the coloring might give you a hint. However, on the mainframe I don't believe there is anything, in the editor, that will help you.
I use
OPTIONS OBS=0 noreplace;
The obs=0 option specifies that 0 observarions are read in from the input
dataset and NOREPLACE tells SAS not to overwite an existing SAS dataset with one of the
same name. If you are creating a new datastet, it will be created with all the attributes,
but with 0 observations. (Be sure to reset the options, if needed, to Options Obs=max replace ; when no more syntax errors are found).
I'd be interested in any other techniques.
Thanks
Explanation about options came from here.

I use the cancel option on the run statement. It will check the syntax of the data step then terminate it without actually executing it. It's the data step analog to the noexec option in proc sql.
data something;
<stuff here>
run cancel;
Lots more details in this SUGI pdf

I write all of my code on my PC with SAS on my PC and the enhanced, color coded editor. I then use SAS/CONNECT to process it on the mainframe. If the datasets are on DASD, I use SAS/CONNECT and Enterprise Guide to directly run the code onthe mainframe (no JCL!) If there is a data tape involved and therefore must be a batch run, I use SAS/CONNECT and the SAS ftp engine to send the code to the mainframe batch queue. I use the SAS email engine to email me back my output and my log. I put and ODS sandwich aound my code to have the mainframe generate a WORD document for output. I use a PROC download to download the output to my server so I can open it in WORD.

This advice is language agnostic.
I would argue that a preferable technique for catching syntax (and logic) errors is to perform a close read (or inspection) of your own code (which should catch the majority of syntax errors), followed by unit tests on small datasets (which will catch any remaining syntax errors, as well as many logic errors if your tests are well-designed).
I agree there's some worth to syntax checking in isolation, but to read and understand your code thoroughly enough before the first compile so that you know it will compile is a good ideal to strive for. Steve McConnell touches on this idea in Code Complete (see page 827 of the 2nd Edition).
P.S. You mentioned syntax highlighting in your original post; there are other editors (such as VIM) that will perform syntax highlighting on SAS files.

Should the text in a C++ text based game be in the code or in external files? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I am creating a text based game using C++ for a school project, the game works by allowing the user to pick a choice from a list of options in each scene; similar to how the games hosted by Choice of Games work. As a result of this I have a large amount of text that must be displayed in my game, however I am unsure as to the proper conventions when working with large amounts text in a program. Should I simply make use of std::cout and write the text directly into the code, or should I write into text files an used std::ifstream in order to read the text.
My only major concern regarding the use of files to hold the text is that each choice the user makes results in a different paragraph being displayed and as a result I believe that I would need to create a text file for each paragraph, which seems like it will lead to more issues (such as using the wrong file name or mistyping my code leading to the game reading from the wrong file) than writing the text straight into the code could. If there is a way to read particular sections of a text file then this would be useful to know, however I am currently unaware of any such method. However I am new to C++ and I am certain that there is plenty that I have yet to learn so I would not be surprised if such a method did exist.
Any help is greatly appreciated, be it anything from simply telling me if I should enter text into my code or into files, to telling me if there is a way to read text from specific sections of a text file. And once again, I am very grateful for any help you can provide.

Please don't put displayed text into code. That's an antipattern. You have to recompile your game for every minor text change like fixing typos, and for major changes like translating into other languages.
Convention for most programming languages is to put all the displayed text into (a few) resource files or properties files as key-value pairs, where the code only references the key of the paragraph to be displayed and the value will be loaded from that external file. (Usually once during startup.) No need to use one file per paragraph, but the kv pairs have to be parsed. There'll be utilities for you to reuse.

I recommend using external files. It makes changing the content much easier and doesn't require recompiling the entire program for a simple typo.
You can use one file and just separate each paragraph with a blank line. Grabbing "all text between blank lines" at that point is trivial.
If the choices cause the paragraph choices to jump around the file you can give them IDs and load them on-the-fly by searching linearly through the file for a given ID.
--EDIT--
As per the request here is an algorithm or two:
Algorithm 1:
Give each paragraph an ID, usually a simple number on the line immediately above the paragraph.
Separate each number-paragraph pair by blank lines.
Parse the file line-by-line looking for a "line" that contains only a number.
From that point you found the paragraph you are looking for, all lines until the next blank is the content of that paragraph.
Display to user.
Algorithm 2 (recommended):
Use XML to store your paragraphs and their IDs.
Use TinyXML2 to parse the file: http://www.grinninglizard.com/tinyxml2/index.html

If you do not plan to translate you game to other languages, you are on your own, both approaches have their pros and cons:
text in source: easy to write, text is near the place where it is used.
text in resource files: easier to remove duplicate strings, forces a better structure of text data.
If you simply imagine that your application could be translated, then you should put all text in ressource files. You can even find framework that will assist your for translations as Gnu gettext, but you can find others, for example qt has its own translation tools.

Storing text in the program files is not a good coding practice. This would result in unnecessary code bloat (it's not even code) and the need to recompile if you need to change the text.
A simple solution would be to create a text file with careful formatting like line numbers or whitespace that would allow you to pull out the desired text.
A more elegant solution would be to put the necessary text in xml or json files, and read them into your program when necessary. This would be a great choice.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js