Inserting elements in an R list - list

I want to store a few entries in a "dictionary" so that I can retreive them by name. I can form something like that indirectly like this:
> a = list(c(1,2),c(9,9,0,0))
> names(a) = c("first","second")
> a
$first
[1] 1 2
$second
[1] 9 9 0 0
However, I can't do the same thing by simply inserting them by name like this:
> a=list()
> a["first"] = c(1,2)
Warning message:
In a["first"] = c(1, 2) :
number of items to replace is not a multiple of replacement length
> a
$first
[1] 1
Why is this so, and what syntax should I use to insert objects like vectors or matrices by name into a list?

Your problem is that you are using [ rather than [[. This should work:
a[['first']] <- c(1,2)
as should this:
a$first <- c(1,2)
Remember, [ gives you a sublist, while [[ accesses specific elements.

You got one good answer. Here's an equivalent answer:
a=list()
a["first"] = list(c(1,2))
a
# $first
# [1] 1 2
So to expand on joran's perfectly fine answer, you are really using [<- as a function, and it both gives (via [) and receives (via[<-) lists.
Just because a function returns something is not a promise that it will set something.

Related

Split Pandas Column by values that are in a list

I have three lists that look like this:
age = ['51+', '21-30', '41-50', '31-40', '<21']
cluster = ['notarget', 'cluster3', 'allclusters', 'cluster1', 'cluster2']
device = ['htc_one_2gb','iphone_6/6+_at&t','iphone_6/6+_vzn','iphone_6/6+_all_other_devices','htc_one_2gb_limited_time_offer','nokia_lumia_v3','iphone5s','htc_one_1gb','nokia_lumia_v3_more_everything']
I also have column in a df that looks like this:
campaign_name
0 notarget_<21_nokia_lumia_v3
1 htc_one_1gb_21-30_notarget
2 41-50_htc_one_2gb_cluster3
3 <21_htc_one_2gb_limited_time_offer_notarget
4 51+_cluster3_iphone_6/6+_all_other_devices
I want to split the column into three separate columns based on the values in the above lists. Like so:
age cluster device
0 <21 notarget nokia_lumia_v3
1 21-30 notarget htc_one_1gb
2 41-50 cluster3 htc_one_2gb
3 <21 notarget htc_one_2gb_limited_time_offer
4 51+ cluster3 iphone_6/6+_all_other_devices
First thought was to do a simple test like this:
ages_list = []
for i in ages:
if i in df['campaign_name'][0]:
ages_list.append(i)
print ages_list
>>> ['<21']
I was then going to convert ages_list to a series and combine it with the remaining two to get the end result above but i assume there is a more pythonic way of doing it?
the idea behind this is that you'll create a regular expression based on the values you already have , for example if you want to build a regular expressions that capture any value from your age list you may do something like this '|'.join(age) and so on for all the values you already have cluster & device.
a special case for device list becuase it contains + sign that will conflict with the regex ( because + means one or more when it comes to regex ) so we can fix this issue by replacing any value of + with \+ , so this mean I want to capture literally +
df = pd.DataFrame({'campaign_name' : ['notarget_<21_nokia_lumia_v3' , 'htc_one_1gb_21-30_notarget' , '41-50_htc_one_2gb_cluster3' , '<21_htc_one_2gb_limited_time_offer_notarget' , '51+_cluster3_iphone_6/6+_all_other_devices'] })
def split_df(df):
campaign_name = df['campaign_name']
df['age'] = re.findall('|'.join(age) , campaign_name)[0]
df['cluster'] = re.findall('|'.join(cluster) , campaign_name)[0]
df['device'] = re.findall('|'.join([x.replace('+' , '\+') for x in device ]) , campaign_name)[0]
return df
df.apply(split_df, axis = 1 )
if you want to drop the original column you can do this
df.apply(split_df, axis = 1 ).drop( 'campaign_name', axis = 1)
Here I'm assuming that a value must be matched by regex but if this is not the case you can do your checks , you got the idea

Use lapply to plot data in a list and use names of list elements as plot titles [duplicate]

This question already has an answer here:
Adding lists names as plot titles in lapply call in R
(1 answer)
Closed 7 years ago.
If I have the following list:
comp.surv <- list(a = 1:4, b = c(1, 2, 4, 8), c = c(1, 3, 8, 27))
comp.surv
# $a
# [1] 1 2 3 4
#
# $b
# [1] 1 2 4 8
#
# $c
# [1] 1 3 8 27
I can use lapply to plot each list element:
lapply(comp.surv, function(x) plot(x))
However, I want to include the name of each list element as plot title (main). For my example data, the title of each graph would be a,b and c respectively. First thing, is that I have a gsub rule that given comp.surv$a, I return a :
gsub(comp.surv\\$([a-z]+), "\\1", deparse(sustitute((comp.surv$a)))
# "a"
Which is good. However I cannot embed this result into my lapply statement above. Any ideas?
In the mean time I have tried getting round this by creating a function this to include the main parameter:
splot <- function(x){
plot(x, main = gsub(comp.surv\\$([a-z]+), "\\1" deparse(sustitute((x))))
}
lapply(comp.surv, function(x) splot(x))
This will plot each sub-variable of comp.surv, but all the titles are blank.
Can anyone recommend if I am going down the right track?
One possibility would be to loop over the names of the list:
lapply(names(comp.surv), function(x) plot(comp.surv[[x]], main = x))
Or slightly more verbose, loop over the list indices:
lapply(seq_along(comp.surv), function(x) plot(comp.surv[[x]], main = names(comp.surv)[x]))
Is that what you want?
ns=names(comp.surv)
lapply(ns, function(x) plot(comp.surv[[x]], main=x,ylab="y"))

Removing duplicates from the data

I already loaded 20 csv files with function:
tbl = list.files(pattern="*.csv")
list_of_data = lapply(tbl, read.csv)
I combined all of those filves into one:
all_data = do.call(rbind.fill, list_of_data)
In the new table is a column called "Accession". After combining many of the names (Accession) are repeated. And I would like to remove all of the duplicates.
Another problem is that some of those "names" are ALMOST the same. The difference is that there is name and after become the dot and the number.
Let me show you how it looks:
AT3G26450.1 <--
AT5G44520.2
AT4G24770.1
AT2G37220.2
AT3G02520.1
AT5G05270.1
AT1G32060.1
AT3G52380.1
AT2G43910.2
AT2G19760.1
AT3G26450.2 <--
<-- = Same sample, different names. Should be treated as one. So just ignore dot and a number after.
Tried this one:
all_data$CleanedAccession = str_extract(all_data$Accession, "^[[:alnum:]]+")
all_data = subset(all_data, !duplicated(CleanedAccession))
Error in `$<-.data.frame`(`*tmp*`, "CleanedAccession", value = character(0)) :
You can use this command to both subset and rename the values:
subset(transform(alldata, Ascension = sub("\\..*", "", Ascension)),
!duplicated(Ascension))
Ascension
1 AT3G26450
2 AT5G44520
3 AT4G24770
4 AT2G37220
5 AT3G02520
6 AT5G05270
7 AT1G32060
8 AT3G52380
9 AT2G43910
10 AT2G19760
What about
df <- data.frame( Accession = c("AT3G26450.1",
"AT5G44520.2",
"AT4G24770.1",
"AT2G37220.2",
"AT3G02520.1",
"AT5G05270.1",
"AT1G32060.1",
"AT3G52380.1",
"AT2G43910.2",
"AT2G19760.1",
"AT3G26450.2"))
df[!duplicated(unlist(lapply(strsplit(as.character(df$Accession),
".", fixed = T), "[", 1))), ]

Why this behavior when coercing a list to character via as.character()?

In the process of (mostly) answering this question, I stumbled across something that I feel like I really should already have seen before. Let's say you've got a list:
l <- list(a = 1:3, b = letters[1:3], c = runif(3))
Attempting to coerce l to various types returns an error:
> as.numeric(l)
Error: (list) object cannot be coerced to type 'double'
> as.logical(l)
Error: (list) object cannot be coerced to type 'logical'
However, I'm apparently allowed to coerce a list to character, I just wasn't expecting this result:
> as.character(l)
[1] "1:3"
[2] "c(\"a\", \"b\", \"c\")"
[3] "c(0.874045701464638, 0.0843329173512757, 0.809434881201014)"
Rather, if I'm allowed to coerce lists to character, I would have thought I'd see behavior more like this:
> as.character(unlist(l))
[1] "1" "2" "3" "a" "b"
[6] "c" "0.874045701464638" "0.0843329173512757" "0.809434881201014"
Note that how I specify the list elements originally affects the output of as.character:
l <- list(a = c(1,2,3), b = letters[1:3], c = runif(3))
> as.character(l)
[1] "c(1, 2, 3)"
[2] "c(\"a\", \"b\", \"c\")"
[3] "c(0.344991483259946, 0.0492411875165999, 0.625746068544686)"
I have two questions:
How is as.character dredging up the information from my original creation of the list l in order to spit out 1:3 versus c(1,2,3).
In what circumstances would I want to do this, exactly? When would I want to call as.character() on a list and get output of this form?
For non-trivial lists, as.character uses deparse to generate the strings.
Only if the vector is integer and 1,2,3,...,n - then it deparses as 1:n.
c(1,2,3) is double whereas 1:3 is integer...
No idea :-)
...but look at deparse if you want to understand as.character here:
deparse(c(1L, 2L, 3L)) # 1:3
deparse(c(3L, 2L, 1L)) # c(3L, 2L, 1L)
deparse(c(1, 2, 3)) # c(1, 2, 3)
The help file does say
For lists it deparses the elements individually, except that it extracts the first element of length-one character vectors.
I'd seen this before in trying to answer a question [not online] about grep. Consider:
> x <- list(letters[1:10],letters[10:19])
> grep("c",x)
[1] 1 2
grep uses as.character on x, with the result that, since both have c( in them, both components match. That took a while to figure out.
On "Why does it do this?", I'd guess that one of the members of R core wanted it to do this.

How to add variable key/value pair to list object?

I have two variables, key and value, and I want to add them as a key/value pair to a list:
key = "width"
value = 32
mylist = list()
mylist$key = value
The result is this:
mylist
# $key
# [1] 32
But I would like this instead:
mylist
# $width
# [1] 32
How can I do this?
R lists can be thought of as hashes- vectors of objects that can be accessed by name. Using this approach you can add a new entry to the list like so:
key <- "width"
value <- 32
mylist <- list()
mylist[[ key ]] <- value
Here we use the string stored in the variable key to access a position in the list much like using the value stored in a loop variable i to access a vector through:
vector[ i ]
The result is:
myList
$width
[1] 32
The setNames() built-in function makes it easy to create a hash from given key and value lists. (Thanks to Nick K for the better suggestion.)
Usage: hh <- setNames(as.list(values), keys)
Example:
players <- c("bob", "tom", "tim", "tony", "tiny", "hubert", "herbert")
rankings <- c(0.2027, 0.2187, 0.0378, 0.3334, 0.0161, 0.0555, 0.1357)
league <- setNames(as.list(rankings), players)
Then accessing the values through the keys is easy:
league$bob
[1] 0.2027
league$hubert
[1] 0.0555
List elements in R can be named. So in your case just do
> mylist = list()
> mylist$width = value
When R encounters this code
> l$somename=something
where l is a list. It appends to a list an element something, and names it with name somename. It is then can be accessed by using
> l[["somename"]]
or
> l$somename
The name can be changed with command names:
> names(l)[names(l)=="somename"] <- "othername"
Or if you now the position of the element in the list by:
> names(l)[1] <- "someothername"
We can use R's list data structure to store data in the form of key-value pair.
Syntax:
ObjectName<-list("key"= value)
Example:
mylist<-list("width"=32)
also, refer example: "https://github.com/WinVector/zmPDSwR/blob/master/Statlog/GCDSteps.R"