When R-Markdown output a chunk result, it insert a ## before the result.
I would like to know how could I change the default result format, specially how to change the double hashtag for another symbol (to not make beginners confused with the code comments).
You can change this within a setup chunk, e.g.
```{r setup, include = FALSE}
knitr::opts_chunk$set(comment = "#>")
```
will prefix #> instead of ##.
Related
I was wondering what is the correct way to get the C/C++ source code of any secondary(to distinguish from the Primitive/Internal ones) function in R.
Related questions are here, here, here and here:
Mine is different so that I used "secondary" in my question.
For example, the read.table() function, within R console I got:
>?read.table
read.table package:utils R Documentation
Data Input
Description:
Reads a file in table format and creates a data frame from it,
with cases corresponding to lines and variables to fields in the
file.
Usage:
read.table(file, header = FALSE, sep = "", quote = "\"'",
......
Or
> getAnywhere(read.table)
A single object matching ‘read.table’ was found
It was found in the following places
package:utils
namespace:utils
with value
function (file, header = FALSE, sep = "", quote = "\"'", dec = ".",
......
attr(data, "row.names") <- row.names
data
}
<bytecode: 0x560ff88edd40>
<environment: namespace:utils>
Search the website I got:
https://svn.r-project.org/R/trunk/src/library/utils/src/utils.c
https://svn.r-project.org/R/trunk/src/library/utils/src/utils.h
How to get the C/C++ source code of the read.table function instead of R code, if this is reasonable?
The searchable R source code at https://github.com/wch/r-source is really useful for this:
First we can look for the read.table definition
The actual data reading is done by the scan function which in the end uses
.Internal(scan(file, what, nmax, sep, dec, quote, skip, nlines,
[...]
Now scan is mapped to do_scan
So here you are: The underlying C implementation for read.table can be found in src/main/scan.c, starting with the function do_scan.
I'm trying to import some publicly available life outcomes data using the code below:
require(gdata)
# Source SIMD12 data zone level data
simd.sg.xls <- read.xls(xls = "http://www.gov.scot/Resource/0044/00447385.xls",
sheet = "Quick Lookup", verbose = TRUE)
Naturally, the imported data frame doesn't look good:
I would like to amend my column names using the code below:
# Clean column names
names(simd.sg.xls) <- make.names(names = as.character(simd.sg.xls[1,]),
unique = TRUE,allow_ = TRUE)
But it produces rather unpleasant results:
> names(simd.sg.xls)
[1] "X1" "X1.1" "X771" "X354" "X229" "X74" "X67" "X33" "X19" "X1.2"
[11] "X6" "X1.3" "X8" "X7" "X7.1" "X6506" "X21" "X1.4" "X6158" "X6506.1"
[21] "X6506.2" "X6506.3" "X6263" "X6506.4" "X6468" "X1010" "X815" "X99" "X58" "X65"
[31] "X60" "X6506.5" "X21.1" "X1.5" "X6173" "X5842" "X6506.6" "X6506.7" "X6263.1" "X6506.8"
[41] "X6481" "X883" "X728" "X112" "X69" "X56" "X54" "X6506.9" "X21.2" "X1.6"
[51] "X6143" "X5651" "X6506.10" "X6506.11" "X6263.2" "X6506.12" "X6480" "X777" "X647" "X434"
[61] "X518" "X246" "X436" "X6506.13" "X21.3" "X1.7" "X6136" "X5677" "X6506.14" "X6506.15"
[71] "X6263.3" "X6506.16" "X660" "X567" "X480" "X557" "X261" "X456"
My question is if there is a way to neatly force the values from the first row to the column names? As I'm doing a lot of data I'm looking for solution that would be easily reproducible, I can accommodate a lot of violation to the actual strings to get syntactically correct names but ideally I would avoid faffing around with elaborate regular expressions as I'm often reading files like the one linked here and don't wan to be forced to adjust the rules for each single import.
It looks like the problem is that the header is on the second line, not the first. You could include a skip=1 argument but a more general way of dealing with this using read.xls seems to be to use the pattern and header arguments which force the first line which matches the pattern string to be treated as the header. Your code becomes:
require(gdata)
# Source SIMD12 data zone level data
simd.sg.xls <- read.xls(xls = "http://www.gov.scot/Resource/0044/00447385.xls",
sheet = "Quick Lookup", verbose = TRUE,
pattern="DATAZONE", header=TRUE)
UPDATE
I don't get the warning messages you do when I execute the code. The messages refer to an issue with locale. The locale settings on my system are:
Sys.getlocale()
[1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252"
Yours are probably different. Locale data could be OS dependent. I'm using Windows 8.1. Also I'm using Strawberry Perl; you appear to be using something else. So some possible reasons for the discrepancy in warning messages but nothing more specific.
On the second question in your comment, to read the entire file, and convert a particular row ( in this case, row 2) to column names, you could use the following code:
simd.sg.xls <- read.xls(xls = "http://www.gov.scot/Resource/0044/00447385.xls",
sheet = "Quick Lookup", verbose = TRUE,
header=FALSE, stringsAsFactors=FALSE)
names(simd.sg.xls) <- make.names(names = simd.sg.xls[2,],
unique = TRUE,allow_ = TRUE)
simd.sg.xls <- simd.sg.xls[-(1:2),]
All data will be of character type so you'll need to convert to factor and numeric as necessary.
I've a directory with csv files, about 12k in number, with the naming format being
YYYY-MM-DD<TICK>.csv
. The <TICK> refers to ticker of a stock, e.g. MSFT, GS, QQQ etc. There are total 500 tickers, of various length.
My aim is to merge all the csv for a particular tick and save as a zoo object in individual RData file in a separate directory.
To automate this I've managed to do the csv manipulation, setup as a function which gets a ticker as input, does all the data modification. But I'm stuck in making the file listing stage, passing the pattern to match the ticker being processed. I'm unable to make the pattern to be matched dependent on the ticker.
Below is the function i've tried to make work, doesn't work:
csvlist2zoo <- function(symbol){
csvlist=list.files(path = "D:/dataset/",pattern=paste("'.*?",symbol,".csv'",sep=""),full.names=T)
}
This works, but can't make it work in function
csvlist2zoo <- function(symbol){
csvlist=list.files(path = "D:/dataset/",pattern='.*?"ibm.csv',sep=""),full.names=T)
}
Searched in SO, there are similar questions, not exactly meeting my requirement. But if I missed something please point out in the right direction. Still fighting with regex.
OS: Win8 64bit, R version-3.1.0 (if needed)
Try:
csvlist2zoo <- function(symbol){
list.files(pattern=paste0('\\d{4}-\\d{2}-\\d{2}',symbol, ".csv"))
}
csvlist2zoo("QQQ")
#[1] "2002-12-19QQQ.csv" "2008-01-25QQQ.csv"
csvlist2zoo("GS")
#[1] "2005-05-18GS.csv"
I created some files in the working directory (linux)
v1 <- c("2001-05-17MSFT.csv", "2005-05-18GS.csv", "2002-12-19QQQ.csv", "2008-01-25QQQ.csv")
lapply(v1, function(x) write.csv(1:3, file=x))
Update
Using paste
csvlist2zoo <- function(symbol){
list.files(pattern=paste('\\d{4}-\\d{2}-\\d{2}',symbol, ".csv", sep=""))
}
csvlist2zoo("QQQ")
#[1] "2002-12-19QQQ.csv" "2008-01-25QQQ.csv"
I am trying to replace strings containing > and < with R
datanames<-names(data)
datanames
## [1] BbMx>2.5 BbAv>2.5 BbMx<2.5 BbAv<2.5
datanames<-gsub("[>]","gt",datanames)
datanames<-gsub("[<]","lt",datanames)
datanames<-gsub("[.]","",datanames)
datanames
## [1] BbMx25 BbAv25 BbMx251 BbAv251
What I am doing wrong?
UPDATE: For some strange reason R doesn't read the same character of the csv. Namely in my csv I read with libreoffice
"BbMx>2.5" "BbAv>2.5" "BbMx<2.5" "BbAv<2.5"
but once R read csv turn this strings in
"BbMx.2.5" "BbAv.2.5" "BbMx.2.5.1" "BbAv.2.5.1"
If you just do
x <- c("BbMx>2.5","BbAv>2.5","BbMx<2.5","BbAv<2.5")
x <- gsub("[>]","gt",x)
x <- gsub("[<]","lt",x)
x <- gsub("[.]","",x)
You should get
"BbMxgt25" "BbAvgt25" "BbMxlt25" "BbAvlt25"
as expected. The problem is that the input from names(data) isn't what you think it is.
R has rules about valid column names in data.frames. R will run make.names on those values to attempt to make uniuqe, valid names. This involved replacing non-alphanumeric values with periods and adding suffixes to ensure uniqueness.
To disable the auto-renaming, you can set check.names=F with the read.table/read.csv function and do the renaming yourself.
So if you have
x<-c("BbMx>2.5", "BbAv>2.5", "BbMx<2.5","BbAv<2.5" )
Then
make.names(x, unique=T)
# [1] "BbMx.2.5" "BbAv.2.5" "BbMx.2.5.1" "BbAv.2.5.1"
So ultimately this had nothing to do with gsub. This was really about how R transforms raw data into data.frames.
I know #MrFlick has provided an answer already, but just to comment on the way you are implementing your characters and calls using gsub, the < and > characters are not considered a character of special meaning so you do not need to place them inside a character class [ ], you can use them as a literal.
And you can cascade your gsub functions together here.
datanames <- gsub('>', 'gt', gsub('<', 'lt', gsub('\\.', '', datanames)))
Ive been following the tutorial on how to use mallet in R to create topic models. My text file has 1 sentence per line. It looks like this and has about 50 sentences.
Thank you again and have a good day :).
This is an apple.
This is awesome!
LOL!
i need 2.
.
.
.
This is my code:
Sys.setenv(NOAWT=TRUE)
#setup the workspace
# Set working directory
dir<-"/Users/jxn"
Dir <- "~/Desktop/Chat/malletR/text" # adjust to suit
require(mallet)
documents1 <- mallet.read.dir(Dir)
View(documents1)
stoplist1<-mallet.read.dir("~/Desktop/Chat/malletR/stoplists")
View(stoplist1)
**mallet.instances <- mallet.import(documents1$id, documents1$text, "~/Desktop/Chat/malletR/stoplists/en.txt", token.regexp ="\\p{L}[\\p{L}\\p{P}]+\\p{L}")**
Everything works except for the last line of the code
**`**mallet.instances <- mallet.import(documents1$id, documents1$text, "~/Desktop/Chat/malletR/stoplists/en.txt", token.regexp ="\\p{L}[\\p{L}\\p{P}]+\\p{L}")**`**
I keep getting this error :
Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, :
java.lang.NoSuchMethodException: No suitable method for the given parameters
According to the package, this is how the function should be:
mallet.instances <- mallet.import(documents$id, documents$text, "en.txt",
token.regexp = "\\p{L}[\\p{L}\\p{P}]+\\p{L}")
I believe it has something to do with the token.regexp argument as
documents1 <- mallet.read.dir(Dir) works just fine which means that the first 3 arguments supplied to mallet.instances was correct.
This is a link to the git repo that i was following the tutorial from.
https://github.com/shawngraham/R/blob/master/topicmodel.R
Any help would be much appreciated.
Thanks,
J
I suspect the problem is with your text file. I have encountered the same error and resolved it by using the as.character() function as follows:
mallet.instances <- mallet.import(as.character(documents$id),
as.character(documents$text),
"en.txt",
FALSE,
token.regexp="\\p{L}[\\p{L}\\p{P}]+\\p{L}")
Are you sure you converted the id field also to character ? It is easy to overlook the advice and leave it as an integer.
Also there is a typo in the code sample: the backslashes have to be escaped:
token.regexp = "\\p{L}[\\p{L}\\p{P}]+\\p{L}"
This usually occurs because the html text editor eats up one backslash.