How to add column to data.table with values from list based on regex

How to add column to data.table with values from list based on regex - regex

I have the following data.table:
id fShort
1 432-12 1245
2 3242-12 453543
3 324-32 45543
4 322-34 45343
5 2324-34 13543
DT <- data.table(
id=c("432-12", "3242-12", "324-32", "322-34", "2324-34"),
fShort=c("1245", "453543", "45543", "45343", "13543"))
and the following list:
filenames <- list("3242-124342345.png", "432-124343.png", "135-13434.jpeg")
I would like to create a new column "fComplete" that includes the complete filename from the list. For this the values of column "id" need to be matched with the filename-list. If the filename starts with the "id" string, the complete filename should be returned. I use the following regex
t <- grep("432-12","432-124343.png",value=T)
that return the correct filename.
This is how the final table should look like:
id fShort fComplete
1 432-12 1245 432-124343.png
2 3242-12 453543 3242-124342345.png
3 324-32 45543 NA
4 322-34 45343 NA
5 2324-34 13543 NA
DT2 <- data.table(
id=c("432-12", "3242-12", "324-32", "322-34", "2324-34"),
fshort=c("1245", "453543", "45543", "45343", "13543"),
fComplete = c("432-124343.png", "3242-124342345.png", NA, NA, NA))
I tried using apply and data.table approaches but I always get warnings like
argument 'pattern' has length > 1 and only the first element will be used
What is a simple approach to accomplish this?

Here's a data.table solution:
DT[ , fComplete := lapply(id, function(x) {
m <- grep(x, filenames, value = TRUE)
if (!length(m)) NA else m})]
id fShort fComplete
1: 432-12 1245 432-124343.png
2: 3242-12 453543 3242-124342345.png
3: 324-32 45543 NA
4: 322-34 45343 NA
5: 2324-34 13543 NA

In my experience with similar functions, sometimes the regex functions return a list, so you have to consider that in the apply - I usually do an example manually
Also apply will not always in y experience on its own return something that always works into a data.frame,sometimes I had to use lap ply, and or unlist and data.frame to modify it
Here is an answer - I am not familiar with data.tables and I was having issues with the filenames being in a list, but with some transformations this works. I worked it out by seeing what apply was outputting and adding the [1] to get the piece I needed
DT <- data.frame(
id=c("432-12", "3242-12", "324-32", "322-34", "2324-34"),
fShort=c("1245", "453543", "45543", "45343", "13543"))
filenames <- list("3242-124342345.png", "432-124343.png", "135-13434.jpeg")
filenames1 <- unlist(filenames)
x<-apply(DT[1],1,function(x) grep(x,filenames1)[1])
DT$fielname <- filenames1[x]

Related

R sets of coordinates extract from string

I'am trying to extract sets of coordinates from strings and change the format.
I have tried some of the stringr package and getting nowhere with the pattern extraction.
It's my first time dealing with regex and still is a little confusing to create a pattern.
There is a data frame with one column with one or more sets of coordinates.
The only pattern (the majority) separating Lat from Long is (-), and to separate one set of coordinates to another there is a (/)
Here is an example of some of the data:
ID Coordinates
1 3438-5150
2 3346-5108/3352-5120 East island, South port
3 West coast (284312 472254)
4 28.39.97-47.05.62/29.09.13-47.44.03
5 2843-4722/3359-5122(1H-2H-3H-4F)
Most of the data is in decimal degree, e.g. (id 1 is Lat 34.38 Lon 51.50), some others is in 00º00'00'', e.g. (id 4 is Lat 28º 39' 97'' Lon 47º 05' 62'')
I will need to make in a few steps
1 - Extract all coordinates sets creating a new row for each set of each record;
2 - Extract the text label of record to a new column, concatenating them;
3- Convert the coordinates from 00º00'00''(28.39.97) to 00.0000º (28.6769 - decimal dregree) so all coordinates are in the same format. I can easily convert if they are as numeric.
4 - Add dot (.) to separate the decimal degree values (from 3438 to 34.38) and add (-) to identify as (-34.38) south west hemisphere. All value must have (-) sign.
I'am trying to get something like this:
Step 1 and 2 - Extract coordinates sets and names
ID x y label
1 3438 5150
2 3346 5108 East island, South port
2 3352 5120 East island, South port
3 284312 472254 West coast
4 28.39.97 47.05.62
4 29.09.13 47.44.03
5 2843 4722 1H-2H-3H-4F
5 3359 5122 1H-2H-3H-4F
Step 3 - convert coordinates format to decimal degree (ID 4)
ID x y label
1 3438 5150
2 3346 5108 East island, South port
2 3352 5120 East island, South port
3 284312 472254 West coast
4 286769 471005
4 291536 470675
5 2843 4722 1H-2H-3H-4F
5 3359 5122 1H-2H-3H-4F
Step 4 - change display format
ID x y label
1 -34.38 -51.50
2 -33.46 -51.08 East island, South port
2 -33.52 -51.20 East island, South port
3 -28.43 -47.22 West coast
4 -28.6769 -47.1005
4 -29.1536 -47.0675
5 -28.43 -47.22 1H-2H-3H-4F
5 -33.59 -51.22 1H-2H-3H-4F
I have edit the question to better clarify my problems and change some of my needs. I realized that it was messy to understand.
So, has anyone worked with something similar?
Any other suggestion would be of great help.
Thank you again for the time to help.

Note: the first answers address the original asking of the question and the last answer addresses its current state. The data in data1 should be set appropriately for each solution.
The following should address your first question given the data you provided and the expected output (using dplyr and tidyr).
library(dplyr)
library(tidyr)
### Load Data
data1 <- structure(list(ID = 1:4, Coordinates = c("3438-5150", "3346-5108/3352-5120",
"2843-4722/3359-5122(1H-2H-3H-4F)", "28.39.97-47.05.62/29.09.13-47.44.03"
)), .Names = c("ID", "Coordinates"), class = "data.frame", row.names = c(NA,
-4L))
### This is a helper function to transform data that is like '1234'
### but should be '12.34', and leaves alone '12.34'.
### You may have to change this based on your use case.
div100 <- function(x) { return(ifelse(x > 100, x / 100, x)) }
### Remove items like "(...)" and change "12.34.56" to "12.34"
### Split into 4 columns and xform numeric value.
data1 %>%
mutate(Coordinates = gsub('\\([^)]+\\)', '', Coordinates),
Coordinates = gsub('(\\d+[.]\\d+)[.]\\d+', '\\1', Coordinates)) %>%
separate(Coordinates, c('x.1', 'y.1', 'x.2', 'y.2'), fill = 'right', sep = '[-/]', convert = TRUE) %>%
mutate_at(vars(matches('^[xy][.]')), div100) # xform columns x.N and y.N
## ID x.1 y.1 x.2 y.2
## 1 1 34.38 51.50 NA NA
## 2 2 33.46 51.08 33.52 51.20
## 3 3 28.43 47.22 33.59 51.22
## 4 4 28.39 47.05 29.09 47.44
The call to mutate modifies Coordinates twice to make substitutions easier.
Edit
A variation that uses another regex substitution instead of mutate_at.
data1 %>%
mutate(Coordinates = gsub('\\([^)]+\\)', '', Coordinates),
Coordinates = gsub('(\\d{2}[.]\\d{2})[.]\\d{2}', '\\1', Coordinates),
Coordinates = gsub('(\\d{2})(\\d{2})', '\\1.\\2', Coordinates)) %>%
separate(Coordinates, c('x.1', 'y.1', 'x.2', 'y.2'), fill = 'right', sep = '[-/]', convert = TRUE)
Edit 2: The following solution addresses the updated version of the question
The following solution does a number of transformations to transform the data. These are separate to make it a bit easier to think about (much easier relatively speaking).
library(dplyr)
library(tidyr)
data1 <- structure(list(ID = 1:5, Coordinates = c("3438-5150", "3346-5108/3352-5120 East island, South port",
"East coast (284312 472254)", "28.39.97-47.05.62/29.09.13-47.44.03",
"2843-4722/3359-5122(1H-2H-3H-4F)")), .Names = c("ID", "Coordinates"
), class = "data.frame", row.names = c(NA, -5L))
### Function for converting to numeric values and
### handles case of "12.34.56" (hours/min/sec)
hms_convert <- function(llval) {
nres <- rep(0, length(llval))
coord3_match_idx <- grepl('^\\d{2}[.]\\d{2}[.]\\d{2}$', llval)
nres[coord3_match_idx] <- sapply(str_split(llval[coord3_match_idx], '[.]', 3), function(x) { sum(as.numeric(x) / c(1,60,3600))})
nres[!coord3_match_idx] <- as.numeric(llval[!coord3_match_idx])
nres
}
### Each mutate works to transform the various data formats
### into a single format. The 'separate' commands then split
### the data into the appropriate columns. The action of each
### 'mutate' can be seen by progressively viewing the results
### (i.e. adding one 'mutate' command at a time).
data1 %>%
mutate(Coordinates_new = Coordinates) %>%
mutate(Coordinates_new = gsub('\\([^) ]+\\)', '', Coordinates_new)) %>%
mutate(Coordinates_new = gsub('(.*?)\\(((\\d{6})[ ](\\d{6}))\\).*', '\\3-\\4 \\1', Coordinates_new)) %>%
mutate(Coordinates_new = gsub('(\\d{2})(\\d{2})(\\d{2})', '\\1.\\2.\\3', Coordinates_new)) %>%
mutate(Coordinates_new = gsub('(\\S+)[\\s]+(.+)', '\\1|\\2', Coordinates_new, perl = TRUE)) %>%
separate(Coordinates_new, c('Coords', 'label'), fill = 'right', sep = '[|]', convert = TRUE) %>%
mutate(Coords = gsub('(\\d{2})(\\d{2})', '\\1.\\2', Coords)) %>%
separate(Coords, c('x.1', 'y.1', 'x.2', 'y.2'), fill = 'right', sep = '[-/]', convert = TRUE) %>%
mutate_at(vars(matches('^[xy][.]')), hms_convert) %>%
mutate_at(vars(matches('^[xy][.]')), function(x) ifelse(!is.na(x), -x, x))
## ID Coordinates x.1 y.1 x.2 y.2 label
## 1 1 3438-5150 -34.38000 -51.50000 NA NA <NA>
## 2 2 3346-5108/3352-5120 East island, South port -33.46000 -51.08000 -33.52000 -51.20000 East island, South port
## 3 3 East coast (284312 472254) -28.72000 -47.38167 NA NA East coast
## 4 4 28.39.97-47.05.62/29.09.13-47.44.03 -28.67694 -47.10056 -29.15361 -47.73417 <NA>
## 5 5 2843-4722/3359-5122(1H-2H-3H-4F) -28.43000 -47.22000 -33.59000 -51.22000 <NA>

We can use stringi. We create a . between the 4 digit numbers with gsub, use stri_extract_all (from stringi) to extract two digit numbers followed by a dot followed by two digit numbers (\\d{2}\\.\\d{2}) to get a list output. As the list elements have unequal length, we can pad NA at the end for those elements that have shorter length than the maximum length and convert to matrix (using stri_list2matrix). After converting to data.frame, changing the character columns to numeric, and cbind with the 'ID' column of the original dataset.
library(stringi)
d1 <- as.data.frame(stri_list2matrix(stri_extract_all_regex(gsub("(\\d{2})(\\d{2})",
"\\1.\\2", data1$Coordinates), "\\d{2}\\.\\d{2}"), byrow=TRUE), stringsAsFactors=FALSE)
d1[] <- lapply(d1, as.numeric)
colnames(d1) <- paste0(c("x.", "y."), rep(1:2,each = 2))
cbind(data1[1], d1)
# ID x.1 y.1 x.2 y.2
#1 1 34.38 51.50 NA NA
#2 2 33.46 51.08 33.52 51.20
#3 3 28.43 47.22 33.59 51.22
#4 4 28.39 47.05 29.09 47.44
But, this can also be done with base R.
#Create the dots for the 4-digit numbers
str1 <- gsub("(\\d{2})(\\d{2})", "\\1.\\2", data1$Coordinates)
#extract the numbers in a list with gregexpr/regmatches
lst <- regmatches(str1, gregexpr("\\d{2}\\.\\d{2}", str1))
#convert to numeric
lst <- lapply(lst, as.numeric)
#pad with NA's at the end and convert to data.frame
d1 <- do.call(rbind.data.frame, lapply(lst, `length<-`, max(lengths(lst))))
#change the column names
colnames(d1) <- paste0(c("x.", "y."), rep(1:2,each = 2))
#cbind with the first column of 'data1'
cbind(data1[1], d1)

R: replacing values in string all at once

I have a data frame that looks like this:
USequence
# 1 GATCAGATC
# 2 ATCAGAC
I'm trying to create a function that would replace all the G's with C's, A's with T's, C's with G's, and T's with A's:
USequence
# 1 CTAGTCTAG
# 2 TAGTCTG
This is what I have right now, the function accepts k, a data frame with a column named USequence.
conjugator <- function(k) {
k$USequence <- str_replace_all(k$USequence,"A","T")
k$USequence <- str_replace_all(k$USequence,"T","A")
k$USequence <- str_replace_all(k$USequence,"G","C")
k$USequence <- str_replace_all(k$USequence,"C","G")
}
However the obvious problem would be that this is doesn't replace the characters at once, but rather in steps which would not return the desired result. Any suggestions? Thanks

You could use chartr
df1$USequence <- chartr('GATC', 'CTAG', df1$USequence)
df1$USequence
#[1] "CTAGTCTAG" "TAGTCTG"
Or
library(gsubfn)
gsubfn('[GATC]', list(G='C', A='T', T='A', C='G'), df1$USequence)
#[1] "CTAGTCTAG" "TAGTCTG"

Use lapply to plot data in a list and use names of list elements as plot titles [duplicate]

This question already has an answer here:
Adding lists names as plot titles in lapply call in R
(1 answer)
Closed 7 years ago.
If I have the following list:
comp.surv <- list(a = 1:4, b = c(1, 2, 4, 8), c = c(1, 3, 8, 27))
comp.surv
# $a
# [1] 1 2 3 4
#
# $b
# [1] 1 2 4 8
#
# $c
# [1] 1 3 8 27
I can use lapply to plot each list element:
lapply(comp.surv, function(x) plot(x))
However, I want to include the name of each list element as plot title (main). For my example data, the title of each graph would be a,b and c respectively. First thing, is that I have a gsub rule that given comp.surv$a, I return a :
gsub(comp.surv\\$([a-z]+), "\\1", deparse(sustitute((comp.surv$a)))
# "a"
Which is good. However I cannot embed this result into my lapply statement above. Any ideas?
In the mean time I have tried getting round this by creating a function this to include the main parameter:
splot <- function(x){
plot(x, main = gsub(comp.surv\\$([a-z]+), "\\1" deparse(sustitute((x))))
}
lapply(comp.surv, function(x) splot(x))
This will plot each sub-variable of comp.surv, but all the titles are blank.
Can anyone recommend if I am going down the right track?

One possibility would be to loop over the names of the list:
lapply(names(comp.surv), function(x) plot(comp.surv[[x]], main = x))
Or slightly more verbose, loop over the list indices:
lapply(seq_along(comp.surv), function(x) plot(comp.surv[[x]], main = names(comp.surv)[x]))

Is that what you want?
ns=names(comp.surv)
lapply(ns, function(x) plot(comp.surv[[x]], main=x,ylab="y"))

How to very efficiently extract specific pattern from characters?

I have big data like this :
> Data[1:7,1]
[1] mature=hsa-miR-5087|mir_Family=-|Gene=OR4F5
[2] mature=hsa-miR-26a-1-3p|mir_Family=mir-26|Gene=OR4F9
[3] mature=hsa-miR-448|mir_Family=mir-448|Gene=OR4F5
[4] mature=hsa-miR-659-3p|mir_Family=-|Gene=OR4F5
[5] mature=hsa-miR-5197-3p|mir_Family=-|Gene=OR4F5
[6] mature=hsa-miR-5093|mir_Family=-|Gene=OR4F5
[7] mature=hsa-miR-650|mir_Family=mir-650|Gene=OR4F5
what I want to do is that, in every row, I want to select the name after word mature= and also the word after Gene= and then pater them together with
paste(a,b, sep="-")
for example, the expected output from first two rows would be like :
hsa-miR-5087-OR4F5
hsa-miR-26a-1-3p-OR4F9
so, the final implementation is like this:
for(i in 1:nrow(Data)){
Data[i,3] <- sub("mature=([^|]*).*Gene=(.*)", "\\1-\\2", Data[i,1])
Name <- strsplit(as.vector(Data[i,2]),"\\|")[[1]][2]
Data[i,4] <- as.numeric(sub("pvalue=","",Name))
print(i)
}
which work well, but it's very slow. the size of Data is very big and it has 200,000,000 rows. this implementation is very slow for that. how can I speed it up ?

If you can guarantee that the format is exactly as you specified, then a regular expression can capture (denoted by the brackets below) everything from the equals sign upto the pipe symbol, and from the Gene= to the end, and paste them together with a minus sign:
sub("mature=([^|]*).*Gene=(.*)", "\\1-\\2", Data[,1])

Another option is to use read.table with = as a separator then pasting the 2 columns:
res = read.table(text=txt,sep='=')
paste(sub('[|].*','',res$V2), ## get rid from last part here
sub('^ +| +$','',res$V4),sep='-') ## remove extra spaces
[1] "hsa-miR-5087-OR4F5" "hsa-miR-26a-1-3p-OR4F9" "hsa-miR-448-OR4F5" "hsa-miR-659-3p-OR4F5"
[5] "hsa-miR-5197-3p-OR4F5" "hsa-miR-5093-OR4F5" "hsa-miR-650-OR4F5"

The simple sub solution already given looks quite nice but just in case here are some other approaches:
1) read.pattern Using read.pattern in the gsubfn package we can parse the data into a data.frame. This intermediate form, DF, can then be manipulated in many ways. In this case we use paste in essentially the same way as in the question:
library(gsubfn)
DF <- read.pattern(text = Data[, 1], pattern = "(\\w+)=([^|]*)")
paste(DF$V2, DF$V6, sep = "-")
giving:
[1] "hsa-miR-5087-OR4F5" "hsa-miR-26a-1-3p-OR4F9" "hsa-miR-448-OR4F5"
[4] "hsa-miR-659-3p-OR4F5" "hsa-miR-5197-3p-OR4F5" "hsa-miR-5093-OR4F5"
[7] "hsa-miR-650-OR4F5"
The intermediate data frame, DF, that was produced looks like this:
> DF
V1 V2 V3 V4 V5 V6
1 mature hsa-miR-5087 mir_Family - Gene OR4F5
2 mature hsa-miR-26a-1-3p mir_Family mir-26 Gene OR4F9
3 mature hsa-miR-448 mir_Family mir-448 Gene OR4F5
4 mature hsa-miR-659-3p mir_Family - Gene OR4F5
5 mature hsa-miR-5197-3p mir_Family - Gene OR4F5
6 mature hsa-miR-5093 mir_Family - Gene OR4F5
7 mature hsa-miR-650 mir_Family mir-650 Gene OR4F5
Here is a visualization of the regular expression we used:
(\w+)=([^|]*)
Debuggex Demo
1a) names We could make DF look nicer by reading the three columns of data and the three names separately. This also improves the paste statement:
DF <- read.pattern(text = Data[, 1], pattern = "=([^|]*)")
names(DF) <- unlist(read.pattern(text = Data[1,1], pattern = "(\\w+)=", as.is = TRUE))
paste(DF$mature, DF$Gene, sep = "-") # same answer as above
The DF in this section that was produced looks like this. It has 3 instead of 6 columns and remaining columns were used to determine appropriate column names:
> DF
mature mir_Family Gene
1 hsa-miR-5087 - OR4F5
2 hsa-miR-26a-1-3p mir-26 OR4F9
3 hsa-miR-448 mir-448 OR4F5
4 hsa-miR-659-3p - OR4F5
5 hsa-miR-5197-3p - OR4F5
6 hsa-miR-5093 - OR4F5
7 hsa-miR-650 mir-650 OR4F5
2) strapplyc
Another approach using the same package. This extracts the fields coming after a = and not containing a | producing a list. We then sapply over that list pasting the first and third fields together:
sapply(strapplyc(Data[, 1], "=([^|]*)"), function(x) paste(x[1], x[3], sep = "-"))
giving the same result.
Here is a visualization of the regular expression used:
=([^|]*)
Debuggex Demo

Here is one approach:
Data <- readLines(n = 7)
mature=hsa-miR-5087|mir_Family=-|Gene=OR4F5
mature=hsa-miR-26a-1-3p|mir_Family=mir-26|Gene=OR4F9
mature=hsa-miR-448|mir_Family=mir-448|Gene=OR4F5
mature=hsa-miR-659-3p|mir_Family=-|Gene=OR4F5
mature=hsa-miR-5197-3p|mir_Family=-|Gene=OR4F5
mature=hsa-miR-5093|mir_Family=-|Gene=OR4F5
mature=hsa-miR-650|mir_Family=mir-650|Gene=OR4F5
df <- read.table(sep = "|", text = Data, stringsAsFactors = FALSE)
l <- lapply(df, strsplit, "=")
trim <- function(x) gsub("^\\s*|\\s*$", "", x)
paste(trim(sapply(l[[1]], "[", 2)), trim(sapply(l[[3]], "[", 2)), sep = "-")
# [1] "hsa-miR-5087-OR4F5" "hsa-miR-26a-1-3p-OR4F9" "hsa-miR-448-OR4F5" "hsa-miR-659-3p-OR4F5" "hsa-miR-5197-3p-OR4F5" "hsa-miR-5093-OR4F5"
# [7] "hsa-miR-650-OR4F5"

Maybe not the more elegant but you can try :
sapply(Data[,1],function(x){
parts<-strsplit(x,"\\|")[[1]]
y<-paste(gsub("(mature=)|(Gene=)","",parts[grepl("mature|Gene",parts)]),collapse="-")
return(y)
})
Example
Data<-data.frame(col1=c("mature=hsa-miR-5087|mir_Family=-|Gene=OR4F5","mature=hsa-miR-26a-1-3p|mir_Family=mir-26|Gene=OR4F9"),col2=1:2,stringsAsFactors=F)
> Data[,1]
[1] "mature=hsa-miR-5087|mir_Family=-|Gene=OR4F5" "mature=hsa-miR-26a-1-3p|mir_Family=mir-26|Gene=OR4F9"
> sapply(Data[,1],function(x){
+ parts<-strsplit(x,"\\|")[[1]]
+ y<-paste(gsub("(mature=)|(Gene=)","",parts[grepl("mature|Gene",parts)]),collapse="-")
+ return(y)
+ })
mature=hsa-miR-5087|mir_Family=-|Gene=OR4F5 mature=hsa-miR-26a-1-3p|mir_Family=mir-26|Gene=OR4F9
"hsa-miR-5087-OR4F5" "hsa-miR-26a-1-3p-OR4F9"

How to separate the variables of a particular column in a CSV file and write to a CSV file in R?

I have a CSV file like
Market,CampaignName,Identity
Wells Fargo,Gary IN MetroChicago IL Metro,56
EMC,Los Angeles CA MetroBoston MA Metro,78
Apple,Cupertino CA Metro,68
Desired Output to a CSV file with the first row as the headers
Market,City,State,Identity
Wells Fargo,Gary,IN,56
Wells Fargo,Chicago,IL,56
EMC,Los Angeles,CA,78
EMC,Boston,MA,78
Apple,Cupertino,CA,68
res <-
gsub('(.*) ([A-Z]{2})*Metro (.*) ([A-Z]{2}) .*','\\1,\\2:\\3,\\4',
xx$Market)
How to modify the above regular expressions to get the result in R?
New to R, any help is appreciated.

library(stringr)
xx.to.split <- with(xx, setNames(gsub("Metro", "", as.character(CampaignName)), Market))
do.call(rbind, str_match_all(xx.to.split, "(.+?) ([A-Z]{2}) ?"))[, -1]
Produces:
[,1] [,2]
Wells Fargo "Gary" "IN"
Wells Fargo "Chicago" "IL"
EMC "Los Angeles" "CA"
EMC "Boston" "MA"
Apple "Cupertino" "CA"
This should work even if you have different number of Compaign Names in each market. Unfortunately I think base options are annoying to implement because frustratingly there isn't a gregexec, although I'd be curious if someone comes up with something comparably compact in base.

Here is a solution using base R. Split the CampaignName column on the string Metro adding sequential numbers as names. stack turns it into a data frame with columns ind and values which we massage into DF1. Merge that with xx by the sequence numbers of DF1 and the row numbers of xx. Move Market to the front of DF2 and remove ind and CampaignName. Finally write it out.
xx <- read.csv("Campaign.csv", as.is = TRUE)
s <- strsplit(xx$CampaignName, " Metro")
names(s) <- seq_along(s)
ss <- stack(s)
DF1 <- with(ss, data.frame(ind,
City = sub(" ..$", "", values),
State = sub(".* ", "", values)))
DF2 <- merge(DF1, xx, by.x = "ind", by.y = 0)
DF <- DF2[ c("Market", setdiff(names(DF2), c("ind", "Market", "CampaignName"))) ]
write.csv(DF, file = "myfile.csv", row.names = FALSE, quote = FALSE)
REVISED to handle extra columns after poster modified the question to include such. Minor improvements.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to add column to data.table with values from list based on regex - regex

Here's a data.table solution: DT[ , fComplete := lapply(id, function(x) { m <- grep(x, filenames, value = TRUE) if (!length(m)) NA else m})] id fShort fComplete 1: 432-12 1245 432-124343.png 2: 3242-12 453543 3242-124342345.png 3: 324-32 45543 NA 4: 322-34 45343 NA 5: 2324-34 13543 NA

Related

R sets of coordinates extract from string

R: replacing values in string all at once

Use lapply to plot data in a list and use names of list elements as plot titles [duplicate]

How to very efficiently extract specific pattern from characters?

How to separate the variables of a particular column in a CSV file and write to a CSV file in R?

Categories

Resources