R: fragment a list - list

I am kind of tired of working with lists..and my limited R capabilities ... I could not solve this from long time...
My list with multiple dataframe looks like the following:
set.seed(456)
sn1 = paste( "X", c(1:4), sep= "")
onelist <- list (df1 <- data.frame(sn = sn1, var1 = runif(4)),
df2 <- data.frame(sn = sn1, var1 = runif(4)),
df3 <- data.frame(sn = sn1,var1 = runif(4)))
[[1]]
sn var1
1 X1 0.3852362
2 X2 0.3729459
3 X3 0.2179086
4 X4 0.7551050
[[2]]
sn var1
1 X1 0.8216811
2 X2 0.5989182
3 X3 0.6510336
4 X4 0.8431172
[[3]]
sn var1
1 X1 0.4532381
2 X2 0.7167571
3 X3 0.2912222
4 X4 0.1798831
I want make a subset list in which the row 2 and 3 are only present.
srow <- c(2:3) # just I have many rows in real data
newlist <- lapply(onelist, function(y) subset(y, row(y) == srow))
The newlist is empty....
> newlist
[[1]]
[1] sn var1
<0 rows> (or 0-length row.names)
[[2]]
[1] sn var1
<0 rows> (or 0-length row.names)
[[3]]
[1] sn var1
<0 rows> (or 0-length row.names)
Help please ....

Does this do it?
Note the comma after the rows which implicitly is interpreted as NULL and results in the extraction all of the columns:
> lapply(onelist, "[", c(2,3),)
[[1]]
sn var1
2 X2 0.2105123
3 X3 0.7329553
[[2]]
sn var1
2 X2 0.33195997
3 X3 0.08243274
[[3]]
sn var1
2 X2 0.3852362
3 X3 0.3729459
You could have gotten your subset strategy to work with:
lapply(onelist, function(y) subset(y, rownames(y) %in% srow ))
Note that many time people use "==" when they really should be using %in%
?match

I don't think the row function does what you think it does:
Returns a matrix of integers indicating their row number in a matrix-like object, or a factor indicating the row labels.
Looking at what it returns on the list you have
> row(onelist[[1]])
[,1] [,2]
[1,] 1 1
[2,] 2 2
[3,] 3 3
[4,] 4 4
> row(onelist[[1]])==srow
[,1] [,2]
[1,] FALSE FALSE
[2,] FALSE FALSE
[3,] FALSE FALSE
[4,] FALSE FALSE
You are doing a simple subset of the data.frames, so you can just use
newlist <- lapply(onelist, function(y) y[srow,])
which gives
> newlist
[[1]]
sn var1
2 X2 0.2105123
3 X3 0.7329553
[[2]]
sn var1
2 X2 0.33195997
3 X3 0.08243274
[[3]]
sn var1
2 X2 0.3852362
3 X3 0.3729459

Related

Replace ":" with " x " in `emmeans::joint_tests` output

I bring this question over from tex exchange because it didn't get much attention there The answer I got doesn't apply to tables longer than 3 rows. Please see how I could change my code. Thanks for your attention. https://tex.stackexchange.com/questions/594324/how-to-replace-all-with-or-x-in-an-anova-table
library(emmeans)
library(kableExtra)
neuralgia.glm <- glm(Pain ~ Treatment * Sex * Age, family=binomial, data = neuralgia)
model term df1 df2 F.ratio p.value
Treatment 2 Inf 0 1.0000
Sex 1 Inf 0 0.9964
Age 1 Inf 0 0.9941
Treatment:Sex 2 Inf 0 1.0000
Treatment:Age 2 Inf 0 1.0000
Sex:Age 1 Inf 0 0.9942
Treatment:Sex:Age 2 Inf 0 1.0000
I want to replace all the colons with x and add space for ease of view. That's what I tried with gsub(":", " x "). The print(neuralgia.glm, export = T) command keeps the p-value format as <0.0001 instead of 0 when knitted.
This code gave me just an x. Using sub or gsub do the same.
joint_tests(neuralgia.glm) %>%
print(, export = T) %>%
gsub(":", " x ") %>%
kable()
This code replaced the colons with " x " but it broke the table.
gsub( "\\:", " x ", print(neuralgia.glm, export = T)) %>%
kable()
Note the order of the arguments to gsub() has x = as the third argument, not the first. So your piping wouldn't work.
Solution:
library(emmeans)
neuralgia.glm <- glm(Pain ~ Treatment * Sex * Age, family=binomial, data = neuralgia)
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
jt <- joint_tests(neuralgia.glm)
jt$`model term` <- gsub(":", " x ", jt$`model term`)
jt
#> model term df1 df2 F.ratio p.value
#> Treatment 2 Inf 0 1.0000
#> Sex 1 Inf 0 0.9970
#> Age 1 Inf 0 0.9951
#> Treatment x Sex 2 Inf 0 1.0000
#> Treatment x Age 2 Inf 0 1.0000
#> Sex x Age 1 Inf 0 0.9952
#> Treatment x Sex x Age 2 Inf 0 1.0000
Created on 2021-06-07 by the reprex package (v2.0.0)

Replace certain spaces to tabs - delimiters

I have one column data.frame where some spaces should be delimiters some just a space.
#input data
dat <- data.frame(x=c("A 2 2 textA1 textA2 Z1",
"B 4 1 textX1 textX2 textX3 Z2",
"C 3 5 textA1 Z3"))
# x
# 1 A 2 2 textA1 textA2 Z1
# 2 B 4 1 textX1 textX2 textX3 Z2
# 3 C 3 5 textA1 Z3
Need to convert it to 5 column data.frame:
#expected output
output <- read.table(text="
A 2 2 textA1 textA2 Z1
B 4 1 textX1 textX2 textX3 Z2
C 3 5 textA1 Z3",sep="\t")
# V1 V2 V3 V4 V5
# 1 A 2 2 textA1 textA2 Z1
# 2 B 4 1 textX1 textX2 textX3 Z2
# 3 C 3 5 textA1 Z3
Essentially, need to change 1st, 2nd, 3rd, and the last space to a tab (or any other delimiter if it makes it easier to code).
Playing with regex is not giving anything useful yet...
Note1: In real data I have to replace 1st, 2nd, 3rd,...,19th and the last spaces to tabs.
Note2: There is no pattern in V4, text can be anything.
Note3: Last column is one word text with variable length.
Try
v1 <- gsub("^([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+", '\\1,\\2,\\3,', dat$x)
read.table(text=sub(' +(?=[^ ]+$)', ',', v1, perl=TRUE), sep=",")
# V1 V2 V3 V4 V5
#1 A 2 2 textA1 textA2 Z1
#2 B 4 1 textX1 textX2 textX3 Z2
#3 C 3 5 textA1 Z3
Or an option inspired from #Tensibai's post
n <- 3
fpat <- function(n){
paste0('^((?:\\w+ ){', n,'})([\\w ]+)\\s+(\\w+)$')
}
read.table(text=gsub(fpat(n), "\\1'\\2' \\3", dat$x, perl=TRUE))
# V1 V2 V3 V4 V5
#1 A 2 2 textA1 textA2 Z1
#2 B 4 1 textX1 textX2 textX3 Z2
#3 C 3 5 textA1 Z3
For more columns,
n <- 19
v1 <- "A 24 34343 212 zea4 2323 12343 111 dsds 134d 153xd 153xe 153de 153dd dd dees eese tees3 zee2 2353 23335 23353 ddfe 3133"
read.table(text=gsub(fpat(n), "\\1'\\2' \\3", v1, perl=TRUE), sep='')
# V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15
#1 A 24 34343 212 zea4 2323 12343 111 dsds 134d 153xd 153xe 153de 153dd dd
# V16 V17 V18 V19 V20 V21
#1 dees eese tees3 zee2 2353 23335 23353 ddfe 3133
With a variable number of columns:
library(stringr)
cols <- 3
m <- str_match(dat$x, paste0("((?:\\w+ ){" , cols , "})([\\w ]+) (\\w+)"))
t <- paste0(gsub(" ", "\t", m[,2]), m[,3], "\t", m[,4])
> read.table(text=t,sep="\t")
V1 V2 V3 V4 V5
1 A 2 2 textA1 textA2 Z1
2 B 4 1 textX1 textX2 textX3 Z2
3 C 3 5 textA1 Z3
Change the number of columns to tell how many you wish before.
For the regex:
((?:\\w+ ){3}) Capture the 3 repetitions {3} of the non capturing group (?:\w+ ) which matche at least one alphanumeric character w+ followed by a space
([\\w ]+) (\w+) capture the free text from alphanumeric char or space [\w ]+ followed by a space and the capture the last word with \w+
Once that done, paste the 3 parts returned by str_match taking care of replacing the spaces in the first group m[,2] by tabs.
m[,1] is the whole match so it's unused here.
Old answer:
A basic one matching based on a fixed number of fields:
> read.table(text=gsub("(\\w+) (\\w+) (\\w+) ([\\w ]+) (\\w+)$","\\1\t\\2\t\\3\t\\4\t\\5",dat$x,perl=TRUE),sep="\t")
V1 V2 V3 V4 V5
1 A 2 2 textA1 textA2 Z1
2 B 4 1 textX1 textX2 textX3 Z2
3 C 3 5 textA1 Z3
Add as many (\w+) you wish before, and increase the number of \1 (back references)
Here can be one twisted way to go that will work whatever the number of "words" you have (and that works on your data); it's based on the number of alphanum characters in your "words" compared to the number of alphanum characters in the other fields:
res <- gsub("\\w{3,}\\K\\t(?=\\w{3,})", " ", gsub(" ", "\t", dat$x), perl=T)
res
# [1] "A\t2\t2\ttextA1 textA2\tZ1" "B\t4\t1\ttextX1 textX2 textX3\tZ2" "C\t3\t5\ttextA1\tZ3"
read.table(text=res, sep="\t")
# V1 V2 V3 V4 V5
#1 A 2 2 textA1 textA2 Z1
#2 B 4 1 textX1 textX2 textX3 Z2
#3 C 3 5 textA1 Z3
EDIT: A completely different way to go, only based on the number of the spaces k you need to replace before the last one:
k <- 3 # in your example
res <- sapply(as.character(dat$x),
function(x, k){
pos_sp <- gregexpr(" ", x)[[1]]
x <- strsplit(x, "")[[1]]
if (length(pos_sp) > k+1) pos_sp <- pos_sp[c(1:k, length(pos_sp))]
x[pos_sp] <- "\t"
x <- paste(x, collapse="")
}, k=k)
read.table(text=res, sep="\t")
# V1 V2 V3 V4 V5
# 1 A 2 2 textA1 textA2 Z1
# 2 B 4 1 textX1 textX2 textX3 Z2
# 3 C 3 5 textA1 Z3

How to split character and numerical separately in R

I have a dataframe which looks like this:
df= data.frame(name= c("1Alex100.00","12Rina Faso92.31","113john00.00"))
And I want to split this into a data frame with 3 columns so that the output looks like:
name1 name2 name3
1 Alex 100.00
12 Rina Faso 92.31
113 john 00.00
I have tried stringr() and grep() and have got limited success. Lack of a delimiter makes it lot more difficult.
You could try
library(tidyr)
res <- extract(df, name, into=c('name1', 'name2', 'name3'),
'(\\d+)([^0-9]+)([0-9.]+)', convert=TRUE)
res
# name1 name2 name3
#1 1 Alex 100.00
#2 2 Rina Faso 92.31
#3 3 john 50.00
str(res)
# 'data.frame': 3 obs. of 3 variables:
#$ name1: int 1 2 3
#$ name2: Factor w/ 3 levels "Alex","john",..: 1 3 2
# $ name3: num 100 92.3 50
Update
Based on 'df' from #DavidArenburg's post
res <- extract(df, name, into=c('name1', 'name2', 'name3'),
'(\\d+)([^0-9]+)([0-9.]+)', convert=TRUE)
res
# name1 name2 name3
#1 121 Réunion 13.76
#2 2 Côte d'Ivoire 22.40
#3 3 john 50.00
Try with str_match from stringr:
str_match(df$name, "^([0-9]*)([A-Za-z ]*)([0-9\\.]*)")
# [,1] [,2] [,3] [,4]
# [1,] "1Alex100.00" "1" "Alex" "100.00"
# [2,] "2Rina Faso92.31" "2" "Rina Faso" "92.31"
# [3,] "3john50.00" "3" "john" "50.00"
So as.data.frame(str_match(df$name, "^([0-9]*)([A-Za-z ]*)([0-9\\.]*)")[,-1]) should give you the desired result.
You could do like this also.
> df <- data.frame(name= c("1Alex100.00","12Rina Faso92.31","113john00.00"))
> x <- do.call(rbind.data.frame, strsplit(as.character(df$name), "(?<=[A-Za-z])(?=\\d)|(?<=\\d)(?=[A-Za-z])", perl=T))
> colnames(x) <- c("name1", "name2", "name3")
> print(x, row.names=FALSE)
name1 name2 name3
1 Alex 100.00
12 Rina Faso 92.31
113 john 00.00
With base R it could be done abit uglier though it works with special characters too
with(df, cbind(sub("\\D.*", "", name),
gsub("[0-9.]", "", name),
gsub(".*[A-Za-z]", "", name)))
# [,1] [,2] [,3]
# [1,] "1" "Alex" "100.00"
# [2,] "2" "Rina Faso" "92.31"
# [3,] "3" "john" "50.00"
An example on special characters
df = data.frame(name= c("121Réunion13.76","2Côte d'Ivoire22.40","3john50.00"))
with(df, cbind(sub("\\D.*", "", name),
gsub("[0-9.]", "", name),
gsub(".*[A-Za-z]", "", name)))
# [,1] [,2] [,3]
# [1,] "121" "Réunion" "13.76"
# [2,] "2" "Côte d'Ivoire" "22.40"
# [3,] "3" "john" "50.00"
Base R not ugly solutions:
proto=data.frame(name1=numeric(),name2=character(),name3=numeric())
strcapture("(\\d+)(\\D+)(.*)",as.character(df$name),proto)
name1 name2 name3
1 1 Alex 100.00
2 12 Rina Faso 92.31
3 113 john 0.00
read.table(text=gsub("(\\d+)(\\D+)(.*)","\\1|\\2|\\3",df$name),sep="|")
V1 V2 V3
1 1 Alex 100.00
2 12 Rina Faso 92.31
3 113 john 0.00
You could use the package unglue :
df <- data.frame(name= c("1Alex100.00","12Rina Faso92.31","113john00.00"))
library(unglue)
unglue_unnest(df, name, "{name1}{name2=\\D+}{name3}", convert = TRUE)
#> name1 name2 name3
#> 1 1 Alex 100.00
#> 2 12 Rina Faso 92.31
#> 3 113 john 0.00

R list function

I have a problem applying a function to list elements. I have a list called "mylist", which looks like:
[[1]] station global
1 2
1 2
1 2
1 14
1 38
1 169
[[2]] station global
2 2
2 2
2 23
2 86
In each list, I need to set values of "global" less than or equal to 2 to NA.
I have used
dat.list <- lapply(mylist, ``[[``, 'global')
to get only the global data.
Defining af function:
fct <- function(x) {
x[x <= 2] <- NA
}
and writing
lapply(dat.list, fct)
gives
[[1]] NA
[[2]] NA
What I would like to have is:
[[1]] station global
1 NA
1 NA
1 NA
1 14
1 38
1 169
[[2]] station global
2 NA
2 NA
2 23
2 86
I apprechiate any help or a point in the right direction, Regards Sisse
It would help if you posted a reproducible example. See here for advice on how to do this.
x will take on the element of the list. Since those appear to be data.frames, treat x as a data.frame:
fct <- function(x) {
x$global[x$global <= 2] <- NA
x
}

Combining (cbind) vectors of different length

I have several vectors of unequal length and I would like to cbind them. I've put the vectors into a list and I have tried to combine the using do.call(cbind, ...):
nm <- list(1:8, 3:8, 1:5)
do.call(cbind, nm)
# [,1] [,2] [,3]
# [1,] 1 3 1
# [2,] 2 4 2
# [3,] 3 5 3
# [4,] 4 6 4
# [5,] 5 7 5
# [6,] 6 8 1
# [7,] 7 3 2
# [8,] 8 4 3
# Warning message:
# In (function (..., deparse.level = 1) :
# number of rows of result is not a multiple of vector length (arg 2)
As expected, the number of rows in the resulting matrix is the length of the longest vector, and the values of the shorter vectors are recycled to make up for the length.
Instead I'd like to pad the shorter vectors with NA values to obtain the same length as the longest vector. I'd like the matrix to look like this:
# [,1] [,2] [,3]
# [1,] 1 3 1
# [2,] 2 4 2
# [3,] 3 5 3
# [4,] 4 6 4
# [5,] 5 7 5
# [6,] 6 8 NA
# [7,] 7 NA NA
# [8,] 8 NA NA
How can I go about doing this?
You can use indexing, if you index a number beyond the size of the object it returns NA. This works for any arbitrary number of rows defined with foo:
nm <- list(1:8,3:8,1:5)
foo <- 8
sapply(nm, '[', 1:foo)
EDIT:
Or in one line using the largest vector as number of rows:
sapply(nm, '[', seq(max(sapply(nm,length))))
From R 3.2.0 you may use lengths ("get the length of each element of a list") instead of sapply(nm, length):
sapply(nm, '[', seq(max(lengths(nm))))
You should fill vectors with NA before calling do.call.
nm <- list(1:8,3:8,1:5)
max_length <- max(unlist(lapply(nm,length)))
nm_filled <- lapply(nm,function(x) {ans <- rep(NA,length=max_length);
ans[1:length(x)]<- x;
return(ans)})
do.call(cbind,nm_filled)
This is a shorter version of Wojciech's solution.
nm <- list(1:8,3:8,1:5)
max_length <- max(sapply(nm,length))
sapply(nm, function(x){
c(x, rep(NA, max_length - length(x)))
})
Here is an option using stri_list2matrix from stringi
library(stringi)
out <- stri_list2matrix(nm)
class(out) <- 'numeric'
out
# [,1] [,2] [,3]
#[1,] 1 3 1
#[2,] 2 4 2
#[3,] 3 5 3
#[4,] 4 6 4
#[5,] 5 7 5
#[6,] 6 8 NA
#[7,] 7 NA NA
#[8,] 8 NA NA
Late to the party but you could use cbind.fill from rowr package with fill = NA
library(rowr)
do.call(cbind.fill, c(nm, fill = NA))
# object object object
#1 1 3 1
#2 2 4 2
#3 3 5 3
#4 4 6 4
#5 5 7 5
#6 6 8 NA
#7 7 NA NA
#8 8 NA NA
If you have a named list instead and want to maintain the headers you could use setNames
nm <- list(a = 1:8, b = 3:8, c = 1:5)
setNames(do.call(cbind.fill, c(nm, fill = NA)), names(nm))
# a b c
#1 1 3 1
#2 2 4 2
#3 3 5 3
#4 4 6 4
#5 5 7 5
#6 6 8 NA
#7 7 NA NA
#8 8 NA NA