conditional replacement between two data.frames by colnames and variables in R - replace

I am trying to replace the values of a dataframe by the values of a variable from another dataframe based on matching colnames from the first one. I was digging around and found some lines of code but the output is not the expected
data<-data.frame(ASV10=c(0,0,1,0,0),ASV78=c(1,0,0,0,0),ASV34=c(0,0,0,0,1))
data2<-data.frame(var=c("ASV78","ASV10","ASV34"),trat=c("A","B","C"),stringsAsFactors = FALSE)
I would like to change numerical values of data in that way: if 0 change to NA if is higher than 0 to the corresponding value of trat in data2
First I changed all 0 values in data to NA
data[data == 0] <- NA
Then I sorted data by its names and data2 by var (it has same values than names(data))
data<-data[,sort(names(data))]
data2 <- data2[order(data2$var),]
Then I tried to change values of data that are not NA by values of data2$trat
data[match(colnames(data),data2$var)]<-ifelse(is.na(data),NA,data2$trat)
However I get
ASV10 ASV34 ASV78
1 <NA> <NA> C
2 <NA> <NA> <NA>
3 A <NA> <NA>
4 <NA> <NA> <NA>
5 <NA> B <NA>
It works partially. Respects NA places,replacing non 0 values, but values of data2$trat do not
correspond between colnames(data) and data2$var
ASV10 ASV34 ASV78
1 <NA> <NA> A
2 <NA> <NA> <NA>
3 B <NA> <NA>
4 <NA> <NA> <NA>
5 <NA> C <NA>
What can it be wrong?
Thank you very much for your help

Finally I managed to get what I want with the following lines
w <- which(data>0,arr.ind=TRUE)
data[w] <- names(data)[w[,"col"]] ## esta parte va
data[data==0]<-NA
test<-data.frame(t(data));test$var<-rownames(test)
test2 <- merge(test, data2, by = "var", all.x = TRUE)
test3<-data.frame(lapply(test2[2:(ncol(test2)-1)], function(x)x<-ifelse(is.na(x), x, test2$trat)))
test3<-data.frame(test2$var,test3)
test3<-t(test3)
colnames(test3)<-test3[1,]
test3<-test3[-1,]

Related

Splitting rows with uneven string length into columns in R using tidyr [duplicate]

This question already has answers here:
Split data frame string column into multiple columns
(16 answers)
Closed 6 years ago.
Edit: This was marked as a duplicate. It is not. The question here is not only about splitting a single column into multiple ones, as my separate code would had worked. The main point of my question is splitting the column when the row string possess varying lengths of column output.
I'm trying to turn this:
data <- c("Place1-Place2-Place2-Place4-Place2-Place3-Place5",
"Place7-Place7-Place7-Place7-Place7-Place7-Place7-Place7",
"Place1-Place1-Place1-Place1-Place3-Place5",
"Place1-Place4-Place2-Place3-Place3-Place5-Place5",
"Place6-Place6",
"Place1-Place2-Place3-Place4")
Into this:
X1 X2 X3 X4 X5 X6 X7 X8
1 Place1 Place2 Place2 Place4 Place2 Place3 Place5
2 Place7 Place7 Place7 Place7 Place7 Place7 Place7 Place7
3 Place1 Place1 Place1 Place1 Place3 Place5
4 Place1 Place4 Place2 Place3 Place3 Place5 Place5
5 Place6 Place6
6 Place1 Place2 Place3 Place4
I tried to use tidyr's seperate function using this code:
library(data.table)
data <- as.data.table(data)
data_table <- tidyr::separate(data,
data,
sep="-",
into = strsplit(data$data, "-"),
fill = "right")
Sadly I'm getting this error:
Warning message:
Too many values at 3 locations: 1, 2, 4
What do I need to change to make it work?
You specify the target columns correctly:
library(tidyr)
separate(DF, V1, paste0("X",1:8), sep="-")
which gives:
X1 X2 X3 X4 X5 X6 X7 X8
1 Place1 Place2 Place2 Place4 Place2 Place3 Place5 <NA>
2 Place7 Place7 Place7 Place7 Place7 Place7 Place7 Place7
3 Place1 Place1 Place1 Place1 Place3 Place5 <NA> <NA>
4 Place1 Place4 Place2 Place3 Place3 Place5 Place5 <NA>
5 Place6 Place6 <NA> <NA> <NA> <NA> <NA> <NA>
6 Place1 Place2 Place3 Place4 <NA> <NA> <NA> <NA>
If you don't know how many target columns you need beforehand, you can use:
> max(sapply(strsplit(as.character(DF$V1),'-'),length))
[1] 8
to extract the maximum number of parts (which is thus the number of columns you need).
Several other methods:
splitstackshape :
library(splitstackshape)
cSplit(DF, "V1", sep="-", direction = "wide")
stringi :
library(stringi)
as.data.frame(stri_list2matrix(stri_split_fixed(DF$V1, "-"), byrow = TRUE))
data.table :
library(data.table)
setDT(DF)[, paste0("v", 1:8) := tstrsplit(V1, "-")][, V1 := NULL][]
stringr :
library(stringr)
as.data.frame(str_split_fixed(DF$V1, "-",8))
which all give a similar result.
Used data:
DF <- data.frame(V1=c("Place1-Place2-Place2-Place4-Place2-Place3-Place5",
"Place7-Place7-Place7-Place7-Place7-Place7-Place7-Place7",
"Place1-Place1-Place1-Place1-Place3-Place5",
"Place1-Place4-Place2-Place3-Place3-Place5-Place5",
"Place6-Place6",
"Place1-Place2-Place3-Place4"))

Matching and pasting with data frames in R

I'm having trouble with a match and paste problem. I have a data frame like
df
# X1 X2 X3 X4 X5 X6
#t1 <NA> <NA> AU 78 <NA> <NA>
#t2 dA AK <NA> <NA> 5 <NA>
#t3 ip <NA> <NA> <NA> <NA> <NA>
#t4 <NA> <NA> <NA> <NA> <NA> BA
I want it to look like this after operations,
newdf
# X1 X2 X3 X4 X5 X6
#v1 <NA> <NA> <NA> <NA> <NA> <NA>
#v2 AU78 <NA> <NA> <NA> <NA> <NA>
#v3 AK5 <NA> <NA> <NA> <NA> <NA>
#v4 <NA> <NA> <NA> <NA> <NA> BA
The process should first search for values that start with 'A'. df[1,3], df[2,2] in this case. Then paste that value to any other numbers further to the right of it (there will always be one number to the right of it). Also, to help, there will never be stray characters in between a target element (like 'AK') and the number to the right of it; only NAs will seperate them.
Those combined new values need to be brought to the first column, and one row down from where it was. It does not matter if values existing in the first row are overwritten.
My pattern locator is,
pat.locate <- lapply(df, function(x) grep('^A', x))
un.pat <- unlist(pat.locate)
#X2 X3
# 2 1
This looked like a good start. From there,
df[un.pat, names(un.pat)]
# X2 X3
#t2 AK <NA>
#t1 <NA> AU
So the target values are found with their column and row indexes. But I need the values to the right of those indexes. To subset the entire rows,
full.row <- df[un.pat, ]
# X1 X2 X3 X4 X5 X6
#t2 dA AK <NA> <NA> 5 <NA>
#t1 <NA> <NA> AU 78 <NA> <NA>
I paste the non-NA values, but you can tell what's going to happen,
paste(full.row[!is.na(full.row)], collapse='')
#[1] "dAAKAU785"
To divide it up, an apply over the rows was used:
pasty <- function(x) paste(x[!is.na(x)], collapse='')
pasted.rows <- apply(full.row, 1, pasty)
# t2 t1
#"dAAK5" "AU78"
That still leaves the stray string at the beginning. If I found a good regex to tell it to cast that off I'd have,
good.regex
# t2 t1
# "AK5" "AU78"
I could then subset the whole data frame based on those indices with,
df[names(good.regex), 1] <- good.regex
df
# X1 X2 X3 X4 X5 X6
#t1 AU78 <NA> AU 78 <NA> <NA>
#t2 AK5 AK <NA> <NA> 5 <NA>
#t3 ip <NA> <NA> <NA> <NA> <NA>
#t4 <NA> <NA> <NA> <NA> <NA> BA
But I'm still left with having to move the pasted values down by one.
df[names(good.regex)+1, 1] <- good.regex
#Error in names(good.regex) + 1 : non-numeric argument to binary operator
We obviously can't add a numeric to a named-style subset. I feel like I'm missing some element early on that's leading me down a difficult path to a solution. A regex would have to be a sub out that uses the pattern match and a look-behind that I can't crack. I think I'm working myself into a corner that is unnecessary. Any help is appreciated.
Data
df <- structure(list(X1 = c(NA, "dA", "ip", NA), X2 = c(NA, "AK", NA,
NA), X3 = c("AU", NA, NA, NA), X4 = c("78", NA, NA, NA), X5 = c(NA,
"5", NA, NA), X6 = c(NA, NA, NA, "BA")), .Names = c("X1", "X2",
"X3", "X4", "X5", "X6"), row.names = c("t1", "t2", "t3", "t4"
), class = "data.frame")
newdf <- structure(list(X1 = structure(c(NA, 2L, 1L, NA), .Names = c("v1",
"v2", "v3", "v4"), .Label = c("AK5", "AU78"), class = "factor"),
X2 = structure(c(NA_integer_, NA_integer_, NA_integer_, NA_integer_
), .Names = c("v1", "v2", "v3", "v4"), .Label = character(0), class = "factor"),
X3 = structure(c(NA_integer_, NA_integer_, NA_integer_, NA_integer_
), .Names = c("v1", "v2", "v3", "v4"), .Label = character(0), class = "factor"),
X4 = structure(c(NA_integer_, NA_integer_, NA_integer_, NA_integer_
), .Names = c("v1", "v2", "v3", "v4"), .Label = character(0), class = "factor"),
X5 = structure(c(NA_integer_, NA_integer_, NA_integer_, NA_integer_
), .Names = c("v1", "v2", "v3", "v4"), .Label = character(0), class = "factor"),
X6 = structure(c(NA, NA, NA, 1L), .Names = c("v1", "v2",
"v3", "v4"), .Label = "BA", class = "factor")), .Names = c("X1",
"X2", "X3", "X4", "X5", "X6"), row.names = c("v1", "v2", "v3",
"v4"), class = "data.frame")
For what I understand according to your output example, the point is to collapse a A* character and its following number in the same row, then move this new entity down to the first column one row below. While "erasing" the original line (row 1 of newdf filled with NA) but keeping lines with no-match intact if they're not affected by the previous movement (row 4).
Your main problem was to collapse on the full row, instead of collapsing only its end.
## original data
df <- structure(list(X1 = c(NA, "dA", "ip", NA),
X2 = c(NA, "AK", NA, NA),
X3 = c("AU", NA, NA, NA),
X4 = c("78", NA, NA, NA),
X5 = c(NA, "5", NA, NA),
X6 = c(NA, NA, NA, "BA")),
.Names = c("X1", "X2", "X3", "X4", "X5", "X6"),
row.names = c("t1", "t2", "t3", "t4"), class = "data.frame")
df
X1 X2 X3 X4 X5 X6
t1 <NA> <NA> AU 78 <NA> <NA>
t2 dA AK <NA> <NA> 5 <NA>
t3 ip <NA> <NA> <NA> <NA> <NA>
t4 <NA> <NA> <NA> <NA> <NA> BA
This following function grab the rows with the matching pattern but collapse only from this pattern to the end of the row, while forgetting its beginning. Thus avoiding the problem encountered with non matching stray character (the dA of your example) :
locateAndPaste <- function(x){
if(TRUE %in% grepl('^A', df[x,])){
endRow <- df[x, grep('^A', df[x,]):length(df)]
pasted.rows <- paste(endRow[!is.na(endRow)], collapse='')
}
else{NA}
}
The else element prevents throwing out errors if no match is found.
newEntity <- sapply(1:nrow(df), locateAndPaste)
# [1] "AU78" "AK5" NA NA
Two matching pattern have been found in row 1 and 2 and none in row 3 and 4.
As you can see the collapsing part worked perfectly.
Your second problem was to move one row down, and the impossibility of adding number to a character string. As I'm not subsetting on the names but on the indexes, the problem is easily avoided:
(in order to be complete, I've added a line at the end of this post regarding the conversion to numeric of those names)
## the newEntity element is already ordered according to the original row numbers
originalRowNumbers <- grep("^A", newEntity)
# [1] 1 2
From then, it's pretty straight forward :
newdf <- df ## all operations can be done on the original df,
## this copy is made only for the sake of the example.
## as per your example, "erase" the original lines where a matching pattern was found
## that will also prevent orphan lines if a no match have been found in the above line
newdf[originalRowNumbers, ] <- rep(NA, length(df))
## place the new entity in the first column one row below
newdf[originalRowNumbers+1, 1] <- newEntity[originalRowNumbers]
## fill the rest of this row with NA as per your example
newdf[originalRowNumbers+1, 2:length(df)] <- NA
newdf
X1 X2 X3 X4 X5 X6
t1 <NA> <NA> <NA> <NA> <NA> <NA>
t2 AU78 <NA> <NA> <NA> <NA> <NA>
t3 AK5 <NA> <NA> <NA> <NA> <NA>
t4 <NA> <NA> <NA> <NA> <NA> BA
However, if a matching pattern were to be found in the last row, an extra row will be added to newdf. In order to avoid that, it's possible to shorten the initial selection :
newEntity <- sapply(1:(nrow(df)-1), locateAndPaste)
To be complete : in your example it's possible to grab only the number in the names of good.regex and then feed them to your subset :
idx.goood.regex <- as.numeric(gsub("t","", names(good.regex)))
# [1] 2 1
df[idx.good.regex+1, 1] <- good.regex
Note that only works because good.regex is of class character. An error would occur if good.regex was a data.frame.

Parsing irregular character strings for numbers and put into structured format using regular expressions in R

I have a vector of irregularly-structured character data, that I want to find an extract particular numbers from. For example, take this piece of a much larger dataset:
x <- c("2001 Tax # $25.19/Widget, 2002 Est Tax # $10.68/Widget; 2000 Est Int # $55.67/Widget",
"1999 Tax # $81.16/Widget",
"1998 Tax # $52.72/Widget; 2001 Est Int # $62.49/Widget",
"1994 Combined Tax/Int # $68.33/widget; 1993 Est Int # $159.67/Widget",
"1993 Combined Tax/Int # $38.33/widget; 1992 Est Int # $159.67/Widget",
"2006 Tax # $129.21/Widget, 1991 Est Tax # $58.19/Widget; 1991 Est Int # $30.95/Widget")
and so on. Reading the table for a larger vector shows that most of the entries are separated by semi-colons or commas, and that there are only a limited number of terms used -- the year, Tax, Int, Combined, Est -- with occasional variations in entries (like ";" versus ",", or "Widget" versus "widget").
I'd like to extract each of the numbers related to the terms above into a more structured data table, such as:
[id] [year] [number] [cat] [est]
row1 2001 25.19 Tax
row1 2002 10.68 Tax Est
row1 2000 55.67 Int Est
row2 1999 81.16 Tax
row3 1998 52.72 Tax
row3 2001 62.49 Int Est
....
or else maybe a more compact / sparse representation like:
[id] [1999tax] [2001tax] [2002esttax] [2000estint]
row1 0 25.19 10.68 55.67
row2 81.16 0 0 0
If that makes sense -- I ultimately need to put this into a regression model.
My first approach has been to write the following pseudocode:
split strings into list using strsplit() on ";" or ","
extract all years
operate on list elements using function that extracts numbers between "$" and "/"
return structured table columns
So far, I've only gotten this far:
pieces.of.x <- strsplit(x1, "[;,]"); head(pieces.of.x)
which gives:
[[1]]
[1] "2001 Tax # $25.19/Widget" " 2002 Est Tax # $10.68/Widget" " 2000 Est Int # $55.67/Widget"
[[2]]
[1] "1999 Tax # $81.16/Widget"
[[3]]
[1] "1998 Tax # $52.72/Widget" " 2001 Est Int # $62.49/Widget"
[[4]]
[1] "1994 Combined Tax/Int # $68.33/widget" " 1993 Est Int # $159.67/Widget"
[[5]]
[1] "1993 Combined Tax/Int # $38.33/widget" " 1992 Est Int # $159.67/Widget"
[[6]]
[1] "2006 Tax # $129.21/Widget" " 1991 Est Tax # $58.19/Widget" " 1991 Est Int # $30.95/Widget"
Unfortunately, I don't have the knowledge of both lapply() and regular expressions ("regex") in R, to make a procedure that is robust enough to extract the years, operate on each sub-vector of elements, and then return them.
Thanks in advance for reading.
The stringr package is pretty useful when dealing with strings, and I bet that someone could even make a single matcher to extract named capture group to get a similar solution...
[edit: missed the combined entries]
library(stringr)
library(data.table)
# Split the row entries
x <- strsplit(x, "[,;]")
# Generate the entry identifiers.
i <- 0
id <- unlist( sapply( x, function(r) rep(i<<-i+1, length(r) ) ) )
# Extract the desired values
x <- unlist( x, recursive = FALSE )
year.re <- "(^\\s?([[:digit:]]{4})\\s)"
value.re <- "[$]([[:digit:]]+[.][[:digit:]]{2})[/]"
object.re <- "[/]([[:alnum:]]+)$"
Cats<- c("Tax","Int","Combination")
x <- lapply( x, function(str) {
c( Year=str_extract( str, year.re),
Category=Cats[ grepl( "Tax", str)*1 + grepl( "Int", str)*2 ],
Estimate=grepl( "Est", str),
Value=str_match( str, value.re)[2],
Object=str_match( str, object.re)[2] )
})
# Create a data object.
data.table( ID=id, do.call(rbind,x), key=c("Year") )
## ID Year Category Estimate Value Object
## 1: 6 1991 Tax TRUE 58.19 Widget
## 2: 6 1991 Int TRUE 30.95 Widget
## 3: 5 1992 Int TRUE 159.67 Widget
## 4: 4 1993 Int TRUE 159.67 Widget
## 5: 5 1993 Combination FALSE 38.33 widget
## 6: 4 1994 Combination FALSE 68.33 widget
## 7: 3 1998 Tax FALSE 52.72 Widget
## 8: 2 1999 Tax FALSE 81.16 Widget
## 9: 1 2000 Int TRUE 55.67 Widget
## 10: 3 2001 Int TRUE 62.49 Widget
## 11: 1 2001 Tax FALSE 25.19 Widget
## 12: 1 2002 Tax TRUE 10.68 Widget
## 13: 6 2006 Tax FALSE 129.21 Widget
This is similar to one of he other answers and distinguishes between line numbers (your [id] column).
matches <- regmatches(x,gregexpr("[0-9]{4} [^#]+# \\$[0-9.]+",x))
lengths <- sapply(matches,length)
z <- unlist(matches)
z <- regmatches(z,regexec("([0-9]{4}) ([^#]+) # \\$([0-9.]+)",z))
df <- t(sapply(z,function(x)c(year=x[2], number=x[4], cat=x[3])))
df <- data.frame(id=rep(1:length(x),times=lengths),df, stringsAsFactors=F)
df$est <- ifelse(grepl("Est",df$cat),"Est","")
df$cat <- regmatches(df$cat,regexpr("[^ /]+$",df$cat))
df
# id year number cat est
# 1 1 2001 25.19 Tax
# 2 1 2002 10.68 Tax Est
# 3 1 2000 55.67 Int Est
# 4 2 1999 81.16 Tax
# 5 3 1998 52.72 Tax
# 6 3 2001 62.49 Int Est
# 7 4 1994 68.33 Int
# 8 4 1993 159.67 Int Est
# 9 5 1993 38.33 Int
# 10 5 1992 159.67 Int Est
# 11 6 2006 129.21 Tax
# 12 6 1991 58.19 Tax Est
# 13 6 1991 30.95 Int Est
To create exactly the dataframe you are asking for, you can use a few tricks like strsplit, regular expressions, and rbind.
x <- unlist(strsplit(x, ',|;'))
bits <- regmatches(x,gregexpr('(\\d|\\.)+|(Tax|Int|Est)', x))
df <- do.call(rbind, lapply(bits, function(info) {
data.frame(year = info[[1]], number = tail(info, 1)[[1]],
cat = if ('Tax' %in% info) 'Tax' else 'Int',
est = if ('Est' %in% info) 'Est' else '')
}))
df$cat <- factor(df$cat); df$est <- factor(df$est);
which gives us
year number cat est
1 2001 25.19 Tax
2 2002 10.68 Tax Est
3 2000 55.67 Int Est
4 1999 81.16 Tax
5 1998 52.72 Tax
You can extract the numbers out using:
regmatches(x,gregexpr('(\\d)+', x))
which yields
[[1]]
[1] "2001" "25.19" "2002" "10.68" "2000" "55.67"
[[2]]
[1] "1999" "81.16"
[[3]]
[1] "1998" "52.72" "2001" "62.49"
[[4]]
[1] "1994" "68.33" "1993" "159.67"
[[5]]
[1] "1993" "38.33" "1992" "159.67"
[[6]]
[1] "2006" "129.21" "1991" "58.19" "1991" "30.95"
However, if you can assume every year's info is separated by a , or ;, try this:
x <- unlist(strsplit(x, ',|;'))
nums <- regmatches(x,gregexpr('(\\d|\\.)+', x))
df <- data.frame(matrix(as.numeric(unlist(nums)), ncol = 2, byrow = TRUE))
colnames(df) <- c('Year', 'Number')
which looks like
Year Number
1 2001 25.19
2 2002 10.68
3 2000 55.67
4 1999 81.16
5 1998 52.72

Creating A Dataframe From A Text Dataset

I have a dataset that has hundreds of thousands of fields. The following is a simplified dataset
dataSet <- c("Plnt SLoc Material Description L.T MRP Stat Auto MatSG PC PN Freq Qty CFreq CQty Cur.RPt New.RPt CurRepl NewRepl Updt Cost ServStock Unit OpenMatResb DFStorLocLevel",
"0231 0002 GB.C152260-00001 ASSY PISTON & SEAL/O-RING 44 PD X A A A 18 136 30 29 50 43 24.88 51.000 EA",
"0231 0002 WH.112734 MOTOR REDUCER, THREE-PHAS 41 PD X B B A 16 17 3 3 5 4 483.87 1.000 EA X",
"0231 0002 WH.920569 SPINDLE MOTOR MINI O 22 PD X A A A 69 85 15 9 25 13 680.91 21.000 EA",
"0231 0002 GB.C150583-00001 VALVE-AIR MDI 64 PD X A A A 16 113 50 35 80 52 19.96 116.000 EA",
"0231 0002 FG.124-0140 BEARING 32 PD X A A A 36 205 35 32 50 48 21.16 55.000 EA",
"0231 0002 WP.254997 BEARING,BALL .9843 X 2.04 52 PD X A A A 18 155 50 39 100 58 2.69 181.000 EA"
)
I would like to create a dataframe out of this dataSet for further calculation. The approach I am following is as follows:
I split the dataSet by space and then recombine it.
dataSetSplit <- strsplit(dataSet, "\\s+")
The header (which is the first line) splits correctly and produces 25 characters. This can be seen by the str() function.
str(dataSetSplit)
I will then intend to combine all the rows together using the folloing script
combinedData <- data.frame(do.call(rbind, dataSetSplit))
Please note that the above script "combinedData " errors because the split did not produce equal number of fields.
For this approach to work all the fields must split correctly into 25 fields.
If you think this is a sound approach please let me know how to split the fileds into 25 fields.
It is worth mentioning that I do not like the approach of splitting the data set with the function strsplit(). It is an extremely time consuming step if used with a large data set. Can you please recommend an alternate approach to create a data frame out of the supplied data?
By the looks of it, you have a header row that is actually helpful. You can easily use gregexpr to calculate your "widths" to use with read.fwf.
Here's how:
## Use gregexpr to find the position of consecutive runs of spaces
## This will tell you the starting position of each column
Widths <- gregexpr("\\s+", dataSet[1])[[1]]
## `read.fwf` doesn't need the starting position, but the width of
## each column. We can use `diff` to calculate this.
Widths <- c(Widths[1], diff(Widths))
## Since there are no spaces after the last column, we need to calculate
## a reasonable width for that column too. We can do this with `nchar`
## to find the widest row in the data. From this, subtract the `sum`
## of all the previous values.
Widths <- c(Widths, max(nchar(dataSet)) - sum(Widths))
Let's also extract the column names. We could do this in read.fwf, but it would require us to substitute the spaces in the first line with a "sep" character.
Names <- scan(what = "", text = dataSet[1])
Now, read in everything except the first line. You would use the actual file instead of textConnection, I would suppose.
read.fwf(textConnection(dataSet), widths=Widths, strip.white = TRUE,
skip = 1, col.names = Names)
# Plnt SLoc Material Description L.T MRP Stat Auto MatSG PC PN Freq Qty
# 1 231 2 GB.C152260-00001 ASSY PISTON & SEAL/O-RING 44 PD NA X A A A 18 136
# 2 231 2 WH.112734 MOTOR REDUCER, THREE-PHAS 41 PD NA X B B A 16 17
# 3 231 2 WH.920569 SPINDLE MOTOR MINI O 22 PD NA X A A A 69 85
# 4 231 2 GB.C150583-00001 VALVE-AIR MDI 64 PD NA X A A A 16 113
# 5 231 2 FG.124-0140 BEARING 32 PD NA X A A A 36 205
# 6 231 2 WP.254997 BEARING,BALL .9843 X 2.04 52 PD NA X A A A 18 155
# CFreq CQty Cur.RPt New.RPt CurRepl NewRepl Updt Cost ServStock Unit OpenMatResb
# 1 NA NA 30 29 50 43 NA 24.88 51 EA <NA>
# 2 NA NA 3 3 5 4 NA 483.87 1 EA X
# 3 NA NA 15 9 25 13 NA 680.91 21 EA <NA>
# 4 NA NA 50 35 80 52 NA 19.96 116 EA <NA>
# 5 NA NA 35 32 50 48 NA 21.16 55 EA <NA>
# 6 NA NA 50 39 100 58 NA 2.69 181 EA <NA>
# DFStorLocLevel
# 1 NA
# 2 NA
# 3 NA
# 4 NA
# 5 NA
# 6 NA
Many thanks to Ananda Mahto, he provided many pieces to this answer.
widthMinusFirst <- diff(gregexpr('(\\s[A-Z])+', dataSet[1])[[1]])
widthFirst <- gregexpr('\\s+', dataSet[1])[[1]][1]
Width <- c(widthFirst, widthMinusFirst)
Widths <- c(Width, max(nchar(dataSet)) - sum(Width))
columnNames <- scan(what = "", text = dataSet[1])
read.fwf(textConnection(dataSet[-1]), widths = Widths, strip.white = FALSE,
skip = 0, col.names = columnNames)

R list function

I have a problem applying a function to list elements. I have a list called "mylist", which looks like:
[[1]] station global
1 2
1 2
1 2
1 14
1 38
1 169
[[2]] station global
2 2
2 2
2 23
2 86
In each list, I need to set values of "global" less than or equal to 2 to NA.
I have used
dat.list <- lapply(mylist, ``[[``, 'global')
to get only the global data.
Defining af function:
fct <- function(x) {
x[x <= 2] <- NA
}
and writing
lapply(dat.list, fct)
gives
[[1]] NA
[[2]] NA
What I would like to have is:
[[1]] station global
1 NA
1 NA
1 NA
1 14
1 38
1 169
[[2]] station global
2 NA
2 NA
2 23
2 86
I apprechiate any help or a point in the right direction, Regards Sisse
It would help if you posted a reproducible example. See here for advice on how to do this.
x will take on the element of the list. Since those appear to be data.frames, treat x as a data.frame:
fct <- function(x) {
x$global[x$global <= 2] <- NA
x
}