SNPRelate: how to give specific color to a population in PCA plot - pca

I am using SNPRelate for PCA analysis. Its using default color for different populations but I want to color them according to me. Plotting commands are like this:
plot(tab$EV2, tab$EV1, col=as.integer(tab$pop),cex=1.2,pch=20,
+ xlab="eigenvector 2", ylab="eigenvector 1")
legend("topleft", legend=levels(tab$pop), cex=1,pch=20, col=1:nlevels(tab$pop))
Head of the input file is like this: pop EV1 EV2
1 A1 POP_I -0.10172849 0.03619405
2 A2 POP_I -0.15951814 0.08234857
3 A3 POP_I -0.15632495 0.08180843
4 A4 POP_I -0.09679447 0.07981108
5 A5 POP_I 0.11362360 -0.03186038
6 A6 POP_I 0.05594095 -0.05498351

Define a list of colors:
col.list <- c("gray", "blue", "green", "red", "blue", "yellow", ...)
plot(tab$EV2, tab$EV1, col=col.list[as.integer(tab$pop)], cex=1.2, pch=20, xlab="eigenvector 2", ylab="eigenvector 1")
legend("topleft", legend=levels(tab$pop), cex=1,pch=20, col=1:nlevels(tab$pop))


How To Interpret Least Square Means and Standard Error

I am trying to understand the results I got for a fake dataset. I have two independent variables, hours, type and response pain.
First question: How was 82.46721 calculated as the lsmeans for the first type?
Second question: Why is the standard error exactly the same (8.24003) for both types?
Third question: Why is the degrees of freedom 3 for both types?
data = data.frame(
type = c("A", "A", "A", "B", "B", "B"),
hours = c(60,72,61, 54,68,66),
# pain = c(85,95,69, 73, 29, 30)
pain = c(85,95,69, 85,95,69)
model = lm(pain ~ hours + type, data = data)
lsmeans(model, c("type", "hours"))
> data
type hours pain
1 A 60 85
2 A 72 95
3 A 61 69
4 B 54 85
5 B 68 95
6 B 66 69
> lsmeans(model, c("type", "hours"))
type hours lsmean SE df lower.CL upper.CL
A 63.5 82.46721 8.24003 3 56.24376 108.6907
B 63.5 83.53279 8.24003 3 57.30933 109.7562
Try this:
newdat <- data.frame(type = c("A", "B"), hours = c(63.5, 63.5))
predict(model, newdata = newdat)
An important thing to note here is that your model has hours as a continuous predictor, not a factor.

Combining IF formulas

I have two IF formulas that I would like to combine - please see attached excel doc.
If C2 = "Blue" =IF(E2="","",IF(((((((B2*(C2-2))*1.02)/(E2-1))/1.02)+(-B2))+(B2))/(B2)<0.65,"NO BET",((C2-1)/(E2-F2)B2)))
If C2 = "Green" =IF(E3="","",IF(((((((B3(C3-2))*1.02)/(E3-1))/1.02)+(-B3))+(B3))/(B3)<0.65,"NO BET",(C3/(E3-F3)*B3)))
The formulas are the same up until after "NO BET". I would like this to be one formula only so that I can change value C2 and it calculates correctly.
Many thanks
I don't have the rep required to comment for clarifications so I've got a couple of possible options depending on what you need. I've also imposed a bit of spacing in my answer so its a bit more readable.
If C2 = "Blue"
If C2 = "Green"
In the question you say that the formulas are the same up to NO BET so, assuming that:
you want both forumlas to work on both rows 2 and 3 and
the C column is filled in Green/Blue for all the different rows
this is how it can work for row 2:
IF(C2 = "Blue",
IF(C2 = "Green",
If Blue/Green is only ever in C2 and the rest of the C column is irrelevent
IF($C$2 = "Blue",
IF($C$2 = "Green",

Adding a new column based on values

I have the following sample data:
data weight_club;
input IdNumber 1-4 Name $ 6-24 Team $ StartWeight EndWeight;
Loss = StartWeight - EndWeight;
1023 David Shaw red 189 165
1049 Amelia Serrano yellow 145 124
1219 Alan Nance purple 210 192
1246 Ravi Sinha yellow 194 177
1078 Ashley McKnight green 127 118
What I would like to do now is the following:
Create two lists with colours (fe, list1 = "red" and "yellow" and list2 = "purple" and "green")
Classify the records according to whether or not they are in list1 and list2 and add a new column.
So the pseudo code is like this:
'Set new category called class
If item is in list1 then class = 1
Else if item is in list2 then class = 2
Else class = 3
Any thoughts on how I can do this most effciently?
Your pseudocode is almost exactly it.
If item is in ('red' 'yellow') then class = 1;
Else if item is in ('purple' 'green') then class = 2;
Else class = 3;
This is really a lookup, so their are many other methods. One I usually recommend as well is Proc format, though in a simplistic case like this I'm not sure of any gains.
Proc format;
Value $ colour_cat
'red', 'yellow' = 1
'purple', 'green' = 2
Other = 3;
And then in a data/SQL either of the following can be used.
*actual conversion;
Category = put(colour, $colour_cat.);
* change display only;
Format colour $colour_cat.;

Identifying rows in data.frames based on complex rules

In two previous questions I have asked how to identify and extract substrings based on complex rules:
Identifying substrings based on complex rules
Extracting capturing groups from a regex
The current question concerns how you would achieve the same end in a data.frame structure. Let's say you have a data.frame as follows:
data.frame(time = seq(1:10),
event = c("FA", "EX", "I1", "FA", "FA", "I3", "EX", "EX", "EX", "I3"),
actor = c("John", "Alex", "John", "Alex", "Tim", "Sandra", "Sara", "John", "Eliza", "Alex"))
time event actor
1 FA John
2 EX Alex
3 I1 John
4 FA Alex
5 FA Tim
6 I3 Sandra
7 EX Sara
8 EX John
9 EX Eliza
10 I3 Alex
Now I want to move from time 1 to 10 and group all rows that precedes an I3. This means that I want to return a list of two data.frames (rows 1-6 and rows 7-10 should each form a separate data.frame to be placed in a common list). How can I accomplish this?
You can use split:
split(dat, c(0, cumsum(dat$event=="I3"))[-(nrow(dat)+1)])
time event actor
1 1 FA John
2 2 EX Alex
3 3 I1 John
4 4 FA Alex
5 5 FA Tim
6 6 I3 Sandra
time event actor
7 7 EX Sara
8 8 EX John
9 9 EX Eliza
10 10 I3 Alex
That works too:
i3.index = which(data$event == "I3")
i3.start = c(1, i3.index[-length(i3.index)]+1)
indexMatrix = cbind(from = i3.start, end = i3.index)
apply(indexMatrix, 1, function(x){data[x[1]:x[2],]})
# [[1]]
# time event actor
# 1 1 FA John
# 2 2 EX Alex
# 3 3 I1 John
# 4 4 FA Alex
# 5 5 FA Tim
# 6 6 I3 Sandra
# [[2]]
# time event actor
# 7 7 EX Sara
# 8 8 EX John
# 9 9 EX Eliza
# 10 10 I3 Alex
This will also work:
data %>%
arrange(time %>% desc) %>%
mutate(group = cumsum(event == "I3")) %>%
arrange(time) %>%

Replace entire strings based on partial match

New to R. Looking to replace the entire string if there is a partial match.
d = c("SDS0G2 Blue", "Blue SSC2CWA3", "Blue SA2M1GC", "SA5 Blue CSQ5")
gsub("Blue", "Red", d, = FALSE, fixed = FALSE)
Output: "SDS0G2 Red" "Red SSC2CWA3" "Red SA2M1GC" "SA5 Red CSQ5"
Desired Output: “Red” “Red” “Red” “Red”
Any help in solving this is truly appreciated.
I'd suggest using grepl to find the indices and replace those indices with "Red":
d = c("SDS0G2 Blue", "Blue SSC2CWA3", "Blue SA2M1GC", "SA5 Blue CSQ5", "ABCDE")
d[grepl("Blue", d,] <- "Red"
# [1] "Red" "Red" "Red" "Red" "ABCDE"
If you did want to keep the variable as a factor and replace multiple partial matches at once, the following function will work (example from another question).
clrs <- c("blue", "light blue", "red", "rose", "ruby", "yellow", "green", "black", "brown", "royal blue")
dfx <- data.frame(colors1=clrs, colors2 = clrs, Amount=sample(100,10))
# Function to replace levels with regex matching
make_levels <- function(.f, patterns, replacement = NULL, = FALSE) {
lvls <- levels(.f)
# Replacements can be listed in the replacement argument, taken as names in patterns, or the patterns themselves.
if(is.null(replacement)) {
replacement <- patterns
replacement <- names(patterns)
# Find matching levels
lvl_match <- setNames(vector("list", length = length(patterns)), replacement)
for(i in seq_along(patterns))
lvl_match[[replacement[i]]] <- grep(patterns[i], lvls, =, value = TRUE)
# Append other non-matching levels
lvl_other <- setdiff(lvls, unlist(lvl_match))
lvl_all <- append(
setNames(as.list(lvl_other), lvl_other)
# Replace levels
levels(dfx$colors2) <- make_levels(.f = dfx$colors2, patterns = c(Blue = "blue", Red = "red|rose|ruby"))
#> colors1 colors2 Amount
#> 1 blue Blue 75
#> 2 light blue Blue 55
#> 3 red Red 47
#> 4 rose Red 83
#> 5 ruby Red 56
#> 6 yellow yellow 10
#> 7 green green 25
#> 8 black black 29
#> 9 brown brown 23
#> 10 royal blue Blue 24
Created on 2020-04-18 by the reprex package (v0.3.0)