In R, how can I insert a TRUE / FALSE column if strings in columns ARE / ARE NOT alphabetic? - regex

Sample data:
df <- data.frame(noun1 = c("cat","dog"), noun2 = c("apple", "tree"))
noun1 noun2
1 cat apple
2 dog tree
How can I make a new column df$alpha that would read FALSE in row 1 and TRUE in row 2?
Thank you!

I think you can just apply is.unsorted() to each row, although you have to unlist it first (probably).
df <- data.frame(noun1 = c("cat","dog"), noun2 = c("apple", "tree"))
df$alpha <- apply(df,1,function(x) !is.unsorted(unlist(x)))
I found is.unsorted() via apropos("sort").

Related

How to knit out table codes into table in R markdown

I am a basic-level learner of R. I am having a problem knitting out tables with a code my professor designed for the students. The code for table designs is set as below. I put this in my R markdown as below.
```{r, results="hide", message=FALSE, warning = FALSE, error = FALSE}
## my style latex summary of regression
jhp_report <- function(...){
output <- capture.output(stargazer(..., omit.stat=c("f", "ser")))
# The first three lines are the ones we want to remove...
output <- output[4:length(output)]
# cat out the results - this is essentially just what stargazer does too
cat(paste(output, collapse = "\n"), "\n")
}
```
After this, I tried printing this out with knitr.
```{r, message=FALSE, warning = FALSE, error = FALSE}
set.seed(1973)
N <- 100
x <- runif(N, 6, 20)
D <- rbinom(N, 1, .5)
t <- 1 + 0.5*x - .4*D + rnorm(N)
df.lm <- data.frame(y = y, x =x, D =D)
df.lm$D <- factor(df.lm$D, labels = c('Male', 'Female'))
##REGRESSION
reg.parallel <- lm(y ~ x + D, data = df.lm)
jhp_report(reg.parallel, title = "Result", label = "tab:D", dep.var.labels = "$y$")
```
As a result, instead of a table, it keeps on showing only the pure codes. I would like to know how I have to set up R markdown for it to print out the table instead of the codes. This is how the result looks like when I knit it.
I expected that there must be some setup options to print the table out. But I couldn't find the right one. Also, my assignment for class requires students to use this code. I did find other options like knitr::kable but I would like to use the given code for this assignment.
Thank you in advance!

Simple Shiny selectInput not working with Intersect

Is there any reason this wouldn't work? I simply want to see which terms are found in the two selected columns. I figured intersect would do the job, but I'm not seeing results. If this looks alright, perhaps I have some other syntax error along the way? Do the inputs need to be in different sidebar panels?
selectInput("data1", "Choose you Input:", choices = colnames(data), selected = "PD.Risk.Factor"),
selectInput("data2", "Choose you Input:", choices = colnames(data), selected = "AD.Risk.Factor")),
Output:
p2 = intersect(x = input$data1, y = input$data2)
print(p2)
Welcome to SO! Please provide a reprex the next time - this will help to get help.
For our problem. What your snippet does is to compare not the columns of your data frame but the the strings as returned by selectInput. What you want to do is to use these strings to retrieve the corresponding columns in the data.
library(shiny)
sample_dat <- data.frame(x = 1:10, y = 5:14, z = 9:18)
ui <- fluidPage(selectInput("col1", "Column 1:", names(sample_dat), "x"),
selectInput("col2", "Column 1:", names(sample_dat), "y"),
verbatimTextOutput("result"))
server <- function(input, output, session) {
output$result <- renderPrint({
list(on_strings = list(col1 = input$col1,
col2 = input$col2,
intersect = intersect(input$col1, input$col2)),
on_cols = list(col1 = input$col1,
col2 = input$col2,
intersect = intersect(sample_dat[[input$col1]],
sample_dat[[input$col2]])))
})
}
shinyApp(ui, server)

Applying Rcpp on a dataframe

I'm new to C++ and exploring faster computation possibilities on R through the Rcpp package. The actual dataframe contains over ~2 million rows, and is quite slow.
Existing Dataframes
Main Dataframe
df<-data.frame(z = c("a","b","c"), a = c(303,403,503), b = c(203,103,803), c = c(903,803,703))
Cost Dataframe
cost <- data.frame("103" = 4, "203" = 5, "303" = 6, "403" = 7, "503" = 8, "603" = 9, "703" = 10, "803" = 11, "903" = 12)
colnames(cost) <- c("103", "203", "303", "403", "503", "603", "703", "803", "903")
Steps
df contains z which is a categorical variable with levels a, b and c. I had done a merge operation from another dataframe to bring in a,b,c into df with the specific nos.
First step would be to match each row in z with the column names (a,b or c) and create a new column called 'type' and copy the corresponding number.
So the first row would read,
df$z[1] = "a"
df$type[1]= 303
Now it must match df$type with column names in another dataframe called 'cost' and create df$cost. The cost dataframe contains column names as numbers e.g. "103", "203" etc.
For our example, df$cost[1] = 6. It matches df$type[1] = 303 with cost$303[1]=6
Final Dataframe should look like this - Created a sample output
df1 <- data.frame(z = c("a","b","c"), type = c("303", "103", "703"), cost = c(6,4,10))
A possible solution, not very elegant but does the job:
library(reshape2)
tmp <- cbind(cost,melt(df)) # create a unique data frame
row.idx <- which(tmp$z==tmp$variable) # row index of matching values
col.val <- match(as.character(tmp$value[row.idx]), names(tmp) ) # find corresponding values in the column names
# now put all together
df2 <- data.frame('z'=unique(df$z),
'type' = tmp$value[row.idx],
'cost' = as.numeric(tmp[1,col.val]) )
the output:
> df2
z type cost
1 a 303 6
2 b 103 4
3 c 703 10
see if it works

R: Populate a new column in a dataframe based on matching one or several possible strings

Hypothetical dataframe:
strings new column
mesh 1
foo 0
bar 0
tack 1
suture 1
I would like the new column to contain "1" if df$strings contains the strings "mesh", "tack", or "sutur". Otherwise it should display zero in the same row. I tried the following:
df$new_column <- ifelse(grepl("mesh" | "tack" | "sutur",
df$strings, ignore.case = T), "1", "0")
but got this error:
Error in "mesh" | "tack" :
operations are possible only for numeric, logical or complex types
Thanks in advance.
You want to use a single string in grep:
df$new_column <- ifelse(grepl("mesh|tack|sutur", df$strings, ignore.case = T),
"1", "0")
will work, but the following will be faster:
df$new_column <- +(grepl("mesh|tack|sutur", df$strings, ignore.case = T))
This will return a 0 and 1 integer vector
We can also use %in%
df$new_column <- as.integer(df$strings %in% c("mesh", "tack", "sutur"))

How to add column to data.table with values from list based on regex

I have the following data.table:
id fShort
1 432-12 1245
2 3242-12 453543
3 324-32 45543
4 322-34 45343
5 2324-34 13543
DT <- data.table(
id=c("432-12", "3242-12", "324-32", "322-34", "2324-34"),
fShort=c("1245", "453543", "45543", "45343", "13543"))
and the following list:
filenames <- list("3242-124342345.png", "432-124343.png", "135-13434.jpeg")
I would like to create a new column "fComplete" that includes the complete filename from the list. For this the values of column "id" need to be matched with the filename-list. If the filename starts with the "id" string, the complete filename should be returned. I use the following regex
t <- grep("432-12","432-124343.png",value=T)
that return the correct filename.
This is how the final table should look like:
id fShort fComplete
1 432-12 1245 432-124343.png
2 3242-12 453543 3242-124342345.png
3 324-32 45543 NA
4 322-34 45343 NA
5 2324-34 13543 NA
DT2 <- data.table(
id=c("432-12", "3242-12", "324-32", "322-34", "2324-34"),
fshort=c("1245", "453543", "45543", "45343", "13543"),
fComplete = c("432-124343.png", "3242-124342345.png", NA, NA, NA))
I tried using apply and data.table approaches but I always get warnings like
argument 'pattern' has length > 1 and only the first element will be used
What is a simple approach to accomplish this?
Here's a data.table solution:
DT[ , fComplete := lapply(id, function(x) {
m <- grep(x, filenames, value = TRUE)
if (!length(m)) NA else m})]
id fShort fComplete
1: 432-12 1245 432-124343.png
2: 3242-12 453543 3242-124342345.png
3: 324-32 45543 NA
4: 322-34 45343 NA
5: 2324-34 13543 NA
In my experience with similar functions, sometimes the regex functions return a list, so you have to consider that in the apply - I usually do an example manually
Also apply will not always in y experience on its own return something that always works into a data.frame,sometimes I had to use lap ply, and or unlist and data.frame to modify it
Here is an answer - I am not familiar with data.tables and I was having issues with the filenames being in a list, but with some transformations this works. I worked it out by seeing what apply was outputting and adding the [1] to get the piece I needed
DT <- data.frame(
id=c("432-12", "3242-12", "324-32", "322-34", "2324-34"),
fShort=c("1245", "453543", "45543", "45343", "13543"))
filenames <- list("3242-124342345.png", "432-124343.png", "135-13434.jpeg")
filenames1 <- unlist(filenames)
x<-apply(DT[1],1,function(x) grep(x,filenames1)[1])
DT$fielname <- filenames1[x]