Excel nesting - IF / AND Query part two? - if-statement

Hi I Had a query earlier and thought I had cracked it with the help of Richard but it doesn't appear
I have attached an image and what I am trying to achieve to make my query clearer.
* If E is correct then cell F will be set to match D manually
* If E is yes and F is set to 111 then G will populate with the contents of C
* If E is no and F is set to anything but 111 then it will return 0
* If E is correct then cell F will be set to match D manually
* If E is yes and F is set to 112 then H will populate with the contents of C
* If E is no and F is set to anything but 112 then it will return 0
* If E is correct then cell F will be set to match D manually
* If E is yes and F is set to 118 then I will populate with the contents of C
* If E is no and F is set to anything but 118 then it will return 0
* If E is correct then cell F will be set to match D manually
* If E is yes and F is set to 119 then J will populate with the contents of C
* If E is no and F is set to anything but 119 then it will return 0

It's not 100% clear, but sounds like this is what you're after:
F2 = =IF(E2="Yes",IF(OR(D2=111,D2=112,D2=118,D2=119)=TRUE,D2,""),"")
G2 = =IF(AND(E2="Yes",F2=111)=TRUE,C2,"")
H2 = =IF(AND(E2="Yes",F2=112)=TRUE,C2,"")
I2 = =IF(AND(E2="Yes",F2=118)=TRUE,C2,"")
J2 = =IF(AND(E2="Yes",F2=119)=TRUE,C2,"")
Then just fill down. I've put "" instead of 0, because it's a lot easier to see what's going on without zero's everywhere. You can change them back once you're happy with the outcome.
Incidentally, sometimes it's easier to parse the code out. Excel works fine if you have code on different lines, like the following for D2:
=
IF(
E2="Yes",
IF(
OR(
D2=111,D2=112,D2=118,D2=119
)=TRUE,
D2,
""
),
""
)

Related

Dropdown query has no error, but returns no results

I'm hoping to allow others to sort through the data using some dropdowns, but they shouldn't have to use all of them if they don't need to.
My query function:
=QUERY(CATALOG!A2:I259,"SELECT * WHERE 1=1 "&IF(A2="Any",""," AND B = '"&A2&"' ")&IF(B2="ANY",""," AND C = '"&B2&"' ")&IF(C2="Any",""," AND D = '"&C2&"' ")&IF(D2="Any",""," AND E = '"&D2&"' ")&IF(E2="Any",""," AND F = '"&E2&"' ")&IF(F2="Any",""," AND G = '"&F2&"' ")&IF(G2="Any",""," AND H = '"&G2&"' "),1)
Whenever I ran this, there wasn't an error but the query didn't give any items.
I initially only tested one of the dropdowns but received nothing. I plugged in a known product into the inputs and still received nothing.
Link to copy of the spreadsheet with dataset and function
https://docs.google.com/spreadsheets/d/1s3tOm_6g8n66HT9md3EAXY7XwbkpmPggYhxdF-zv5ok/edit?usp=sharing
Some of the search parameters seem to be numbers. In that case do not use the single quotes (as that will turn them into strings). See if this works
=QUERY(CATALOG!A2:I259,"SELECT * WHERE 1=1 "&IF(A2="Any",""," AND B = "&A2&" ")&IF(B2="ANY",""," AND C = '"&B2&"' ")&IF(C2="Any",""," AND D = '"&C2&"' ")&IF(D2="Any",""," AND E = '"&D2&"' ")&IF(E2="Any",""," AND F = "&E2&" ")&IF(F2="Any",""," AND G = "&F2&" ")&IF(G2="Any",""," AND H = '"&G2&"' "),1)
try shorter:
=QUERY(CATALOG!A2:I259,
"where 1=1 "&
IF(A2="Any",," and B = "&A2)&
IF(B2="Any",," and C = '"&B2&"'")&
IF(C2="Any",," and D = '"&C2&"'")&
IF(D2="Any",," and E = '"&D2&"'")&
IF(E2="Any",," and F = "&E2)&
IF(F2="Any",," and G = "&F2)&
IF(G2="Any",," and H = '"&G2&"'"), 1)

Conditional calculation based on another column

I have a cross reference table and another table with the list of "Items"
I connect "PKG" to "Item" as "PKG" has distinct values.
Example:
**Cross table** **Item table**
Bulk PKG Item Value
A D A 2
A E B 1
B F C 4
C G D 5
E 8
F 3
G 1
After connecting the 2 above tables by PKG and ITEM i get the following result
Item Value Bulk PKG
A 2
B 1
C 4
D 5 A D
E 8 A E
F 3 B F
G 1 C G
As you can see nothing shows up for the first 3 values since it is connected by pkg and those are "Bulk" values.
I am trying to create a new column that uses the cross reference table
I want to create the following with a new column
Item Value Bulk PKG NEW COLUMN
A 2 5
B 1 3
C 4 1
D 5 A D 5.75
E 8 A E 9.2
F 3 B F 3.45
G 1 C G 1.15
The new column is what I am trying to create.
I want the original values to show up for bulk as they appear for pkg. I then want the Pkg items to be 15% higher than the original value.
How can I calculate this based on the setup?
Just write a conditional custom column in the query editor:
New Column = if [Bulk] = null then [Value] else 1.15 * [Value]
You can also do this as a DAX calculated column:
New Column = IF( ISBLANK( Table1[Bulk] ), Table1[Value], 1.15 * Table1[Value] )

Use a regular expression extract substring from data frame columns in R

I am fairly new to R so please go easy on me if this is a stupid question.
I have a dataframe called foo:
< head(foo)
Old.Clone.Name New.Clone.Name File
1 A Aa A_mask_MF_final_IS2_SAEE7-1_02.nrrd
2 B Bb B_mask_MF_final_IS2ViaIS2h_SADQ15-1_02.nrrd
3 C Cc C_mask_MF_final_IS2ViaIS2h_SAEC16-1_02.nrrd
4 D Dd D_mask_MF_final_IS2ViaIS2h_SAEJ6-1_02.nrrd
5 E Ee F_mask_MF_final_IS2_SAED9-1_02.nrrd
6 F Ff F_mask_MF_final_IS2ViaIS2h_SAGP3-1_02.nrrd
I want to extract codes from the File column that match the regular expression (S[A-Z]{3}[0-9]{1,2}-[0-9]_02), to give me:
SAEE7-1_02
SADQ15-1_02
SAEC16-1_02
SAEJ6-1_02
SAED9-1_02
SAGP3-1_02
I then want to use these codes to search another directory for other files that contain the same code.
I fail, however, at the first hurdle and cannot extract the codes from that column of the data frame.
I have tried:
library('stringr')
str_extract(foo[3],regex("(S[A-Z]{3}[0-9]{1,2}-[0-9]_02)", ignore_case = TRUE))
but this just returns [1] NA.
Am I simply missing something obvious? I look forward to cracking this with a bit of help from the community.
Hello if you are reading the data as a table file then foo[3] is a list and str_extract does not accept lists, only strings, then you should use lapply to extract the match of every element.
lapply(foo[3], function(x) str_extract(x, "[sS][a-zA-Z]{3}[0-9]{1,2}-[0-9]_02"))
Result:
[1] "SAEE7-1_02" "SADQ15-1_02" "SAEC16-1_02" "SAEJ6-1_02" "SAED9-1_02"
[6] "SAGP3-1_02"
str_extract(foo[3],"(?i)S[A-Z]{3}[0-9]{1,2}-[0-9]_02")
seems to work. Somehow, my R gave me
"Error in check_pattern(pattern, string) : could not find function "regex""
when using your original expression.
The following code will repeat what you asked (just copy and paste to your R console):
library(stringr)
foo = scan(what='')
Old.Clone.Name New.Clone.Name File
A Aa A_mask_MF_final_IS2_SAEE7-1_02.nrrd
B Bb B_mask_MF_final_IS2ViaIS2h_SADQ15-1_02.nrrd
C Cc C_mask_MF_final_IS2ViaIS2h_SAEC16-1_02.nrrd
D Dd D_mask_MF_final_IS2ViaIS2h_SAEJ6-1_02.nrrd
E Ee F_mask_MF_final_IS2_SAED9-1_02.nrrd
F Ff F_mask_MF_final_IS2ViaIS2h_SAGP3-1_02.nrrd
foo = matrix(foo,ncol=3,byrow=T)
colnames(foo)=foo[1,]
foo = foo[-1,]
foo
str_extract(foo[,3],regex("(S[A-Z]{3}[0-9]{1,2}-[0-9]_02)", ignore_case = T))
The reason you get NULL is hidden: R stores entries by column, hence foo[3] is the 3rd row and 1st column of foo matrix/data frame. To quote the third column, you may need to use foo[,3]. or foo<-data.frame(foo); foo[[3]].

Stata rename a lot of variables from another list

I'm importing a very complex .xls file that often combines multiple cells together in the variable names. After importing it into Stata, only the first cell has a variable name, and the other 3 are blank. Is it possible to write a loop to rename all the variables (which come in sets of 4)?
For instance, the variables go: Russia, B, C, D but I would like them to be named Russia_A, Russia_B, Russia_C, Russia_D. Is there a way to do this with a loop or command within Stata?
It's impossible to have blank variable names in Stata, as your own example attests. On the information given your variable names come in fours, so that you could loop. One basic technique is just to cycle over 1, 2, 3, 4 and act accordingly. This example works. If it's not what you want, a minimal reproducible example is essential showing why this is different from what you want.
clear
input Russia B C D Germany E F G France H I J
42 42 42 42 42 42 42 42 42 42 42 42
end
tokenize "A B C D"
local i = 0
foreach v of var * {
local ++i
if `i' == 1 local stub "`v'"
rename `v' `stub'_``i''
if `i' == 4 local i = 0
}
ds
Russia_A Russia_C Germany_A Germany_C France_A France_C
Russia_B Russia_D Germany_B Germany_D France_B France_D
tokenize is possibly the least familiar command here, so see its help if needed.
All that said, it's unlikely that this is a useful data structure. See help reshape.
Here's another way to do it. We set up a counter running over all the variables. This perhaps is more of a finger exercise in macro manipulation.
clear
input Russia B C D Germany E F G France H I J
42 42 42 42 42 42 42 42 42 42 42 42
end
tokenize "A B C D"
forval j = 1/4 {
local sub`j' "``j''"
}
unab all : *
tokenize "`all'"
local J : word count `all'
forval j = 1/`J' {
local k = mod(`j', 4)
if `k' == 0 local k = 4
if `k' == 1 local stub "``j''"
rename ``j'' `stub'`sub`k''
}
ds

Extracting columns with a difference in aligned data

I have some aligned data (something bioinformatic related) as so:
reference_string = 'yearning'
string2 = 'learning'
string3 = 'aligning'
I need to extract only columns showing differences in relation to the reference data.
The output should show only positional information of the columns containing differences in relation to the reference string and the corresponding reference item.
1 2 3 4
y e a r
l
a l i g
My current code does most things okay except that it also reports columns with no difference.
string1 = 'yearning'
string2 = 'learning'
string3 = 'aligning'
string_list = [string1, string2]
reference = reference_string
diffs_top, diffs = [], []
all_diffs = set()
for s in string_list:
diffs = []
for i, c in enumerate(s):
if s[i] != reference[i]:
diffs.append(i)
all_diffs.add(i)
diffs_top.append(diffs)
for d in all_diffs:
print str(int(d+1)),
print
for c in reference:
print str(c),
print
for i, s in enumerate(string_list):
for j, c in enumerate(s):
if j in diffs_top[i]:
print str(c),
else:
print str(' '),
print
This code would give:
1 2 3 4
y e a r n i n g
l
a l i g
Any help appreciated.
EDIT: I have picked some section of real data to make the problem as clearer as possible and my attempt at solving it thus far:
reference_string = 'MAHEWGPQRLAGGQPQAS'
string1 = 'MAQQWSLQRLAGRHPQDS'
string2 = 'MAQRWGAHRLTGGQLQDT'
string3 = 'MAQRWGPHALSGVQAQDA'
string_list = [string1, string2, string3]
reference = reference_string
diffs_top, diffs = [], []
all_diffs = set()
for s in string_list:
diffs = []
for i, c in enumerate(s):
if s[i] != reference[i]:
diffs.append(i)
all_diffs.add(i)
diffs_top.append(diffs)
#print diffs_top
#print all_diffs
for d in all_diffs:
print str(int(d+1)), # retains natural positions of the reference residues
print
for d in all_diffs:
for i, c in enumerate(reference):
if i == d:
print c,
print
The print out will be an output showing the position at which there is any difference to other non-reference strings and the corresponding reference letter.
3 4 6 7 8 9 11 13 14 15 17 18
H E G P Q R A G Q P A S
Then the next step is to write a code that will process non reference strings by printing out the difference with the reference (at that position). If there is no difference it will leave blank (' ').
Doing it manually the output will be:
3 4 6 7 8 9 11 13 14 15 17 18
H E G P Q R A G Q P A S
Q Q S L R H D
Q R A H T L D T
Q R H A S V A D A
My entire code as an attempt to get to the solution above as been messy to say the least:
reference_string = 'MAHEWGPQRLAGGQPQAS'
string1 = 'MAQQWSLQRLAGRHPQDS'
string2 = 'MAQRWGAHRLTGGQLQDT'
string3 = 'MAQRWGPHALSGVQAQDA'
string_list = [string1, string2, string3]
reference = reference_string
diffs_top, diffs = [], []
all_diffs = set()
for s in string_list:
diffs = []
for i, c in enumerate(s):
if s[i] != reference[i]:
diffs.append(i)
all_diffs.add(i)
diffs_top.append(diffs)
#print diffs_top
#print all_diffs
for d in all_diffs:
print str(int(d+1)),
print
for d in all_diffs:
for i, c in enumerate(reference):
if i == d:
print c,
print
# this is my attempt to look into non-reference strings
# to check for the difference with the reference, and print an output.
for d in all_diffs:
for i, s in enumerate(string_list):
for j, c in enumerate(s):
if j == d:
print c,
else:
print str(' '),
print
Your code is working perfectly fine (as per your logic).
What is happening , is that while printing the output, when you come across the reference string, Python looks for the corresponding entry in the diffs_top list and because while storing in diff_top, you have no entry stored for the reference string, Python just prints blank spaces for your reference string.
1 2 3 4
y e a r n i n g #prints the reference string, because you've coded in that way
#prints blank as string_list[0] and reference string are the same
l
a l i g
The question here is how exactly do you define your difference for reference string.
Besides, I also found some fundamental flaws in your code implementation. If you try to run your code by setting string_list[1] as your reference string, you would get your output as :
1 2 3 4
l e a r n i n g
y
a l i g
Is this what you need? Please spend some time in properly defining difference for all cases and then try to implement you code.
EDIT:
As per you updated requirements, replace the last block in your code with this:
for i, s in enumerate(string_list):
for d in all_diffs:
if d in diffs_top[i]:
print s[d],
else:
print ' ',
print
Cheers!
I think there is a general problem in your logic. If you need to extract only columns showing difference in relation to the reference data and string1 is the reference the output should be:
1 2 3 4
l
a l i g
So, 'yearning' shouldn't show any character because it has no difference to string1.
If you delete or put the following lines in comments, you will exactly get what I expect is the right answer:
#for c in reference:
# print str(c),
#print
Consider to review your logic if this solution is not what you actually want.
Update
Here is a shorter solution which solves your task:
from itertools import compress, izip_longest
def delta(reference, string):
return [ '' if a == b else b for a, b in izip_longest(reference, string)]
ref_string = 'MAHEWGPQRLAGGQPQAS'
strings = ['MAQQWSLQRLAGRHPQDS',
'MAQRWGAHRLTGGQLQDT',
'MAQRWGPHALSGVQAQDA']
delta_strings = [delta(ref_string, string) for string in strings]
selectors = [1 if any(tup) else 0 for tup in izip_longest(*delta_strings)]
indices = [str(i+1) for i in range(len(selectors))]
output_data = [indices, ref_string] + delta_strings
for line in output_data:
print ''.join(x.rjust(3) for x in compress(line, selectors))
Explanation:
I defined a function delta(reference, string) which returns the delta between the string and the referenced string. For example: delta("ABFF", "AECF") returns the list ['', E, C, ''].
The variable delta_strings holds all the deltas between each string in the list strings and the reference string ref_string.
The variable selector is a list containing only 1 and 0 values, where 0 specifies the collumns which shouldn't be printed and vice versa.