Index match formula with IF statement - if-statement

I am having an issue with a formula for google sheets:
=ifna(IF($A14="TRP Drills",INDEX('Drills DD Skill by Skill'!$B$3:$B,match($D14,'Drills DD Skill by Skill'!$A$3:$A,0))*$F14," "),if($A14="DMT Drills",INDEX('DMT Drills DD Skill by Skill'!$B$3:$B,match($D14,'DMT Drills DD Skill by Skill'!$A3:$A,0))*$F14,""))
This is working for the first If rule “TRP Drills” but it isn’t working when I change A14 to the second If rule “DMT Drills”
Can anyone see any reasons why this may be not working ?

The first argument in infa() will not evaluate to NA since you are returning " " when it evaluates to false.
Try the following:
=iferror(ifna(IF($A14="TRP Drills",INDEX('Drills DD Skill by Skill'!$B$3:$B,match($D14,'Drills DD Skill by Skill'!$A$3:$A,0))*$F14,na()),if($A14="DMT Drills",INDEX('DMT Drills DD Skill by Skill'!$B$3:$B,match($D14,'DMT Drills DD Skill by Skill'!$A3:$A,0))*$F14,"")),"")

Related

Getting value between '-' in google sheets

Im trying to get the number between '-' and '-' in google sheets but after trying many things I still havent been able to find the solution.
Data record 1
England Premier League
West Ham vs Crystal Palace
2.090 - 3.47 - 3.770
Expected value = 3.47
Data record 2
England League Two
Carlisle vs Scunthorpe
2.830 - 3.15 - 2.820
Expected value = 3.15
Hopefully someone can help me out
Try either of the following
option 1.
=INDEX(IFERROR(REGEXEXTRACT(AE1:AE4," \d+\.\d+ ")*1))
option 2.
=INDEX(IFERROR(REGEXEXTRACT(AE1:AE4,".* - (\d+\.\d+) ")))
(Do adjust the formula according to your ranges and locale)
use:
=INDEX(IFNA(REGEXEXTRACT(A1:A, "- (\d+(?:.\d+)?) -")*1))

String manipulation in google spreadsheet for getting city name and states from address

I am working on a google spreadsheet and I was stuck how to extract the second state name from the string.
MWE
A B C D
1 Address City State1 State2
2 Dublin,OH Dublin OH
3 Chicago,IL,NY Chicago IL NY
4 NY,Atlanta, DC Atlanta NY DC
5 Seattle,WA Seattle WA
From the address, how to get city, state1, and state2?
Link to the google sheet: https://docs.google.com/spreadsheets/d/10NzbtJhQj4hQBnZXcmwise3bLBIAWrE0qwSus_bz7a0/edit?usp=sharing
Notes
The state name is all caps.
There can be no second state.
In the spreadsheet you shared I entered in B1
={"City", "State1", "State2"; Arrayformula(if(len(A2:A), split(A2:A, ","),))}
See if that helps?
Suppose "Dublin, OH" from your post example list were in A2 with the others running from A3:A. Try this in B2 (making sure that B2:B is blank first):
=ArrayFormula(IF(A2:A="",,IF(IFERROR(REGEXEXTRACT(A2:A,"[A-Z]{2}")=REGEXEXTRACT(A2:A,".+([A-Z]{2})$"),TRUE),,IFERROR(REGEXEXTRACT(A2:A,".+([A-Z]{2})$")))))
The "plain-English version" of this reads as follows:
"If any cell in A2:A is blank, the corresponding cell in B2:B should also be blank. Otherwise, if the first instance of two juxtaposed capital letters is the same as the last instance of two juxtaposed capital letters, there is only one state present: leave that cell in Col B blank. Otherwise, pull the last such instance. If there are no instances of two juxtaposed capital letters for any test, instead of listing it as an error, list that as blank also."

Rocket Universe & Unidata File

This is just for clarification, know exactly what a qpointer is but today in a meeting the concept of a dpointer was raised. Anyone know what a "D" pointer refers to? Never ever heard this term before.
This is a nice question because it helped me put together a couple of pieces I had rolling around in my head, so thanks for that!
D's are dictionary items that refer to a logical location in the the data array and you have probably seen them a million times in the DICT of any given file.
A D Item in the VOC servers the same purpose and is valid with any query. Lots of shops have some generics (F1, F2, F3, F4, F5, F6..etc) set up so you don't have to remember the dictionary name if you know what filed you want. I think the precedence for dictionary items is DICT File -> VOC but I could be wrong on that.
As an example to illiterate this I went into HS.SALES and took one of the DICT items in the CUSTOMER table and wrote it to VOC after removing the conversion in field 3. I chose BUY_DATE because it had a conversion
SORT CUSTOMER BUY_DATE 06:51:04am 10 Oct 2017 PAGE 1
CUSTOMER.. Date Purchased
1 01/07/91
10 01/28/91
01/29/91
01/30/91
Remove the conversion and save into the VOC.
>ED DICT CUSTOMER BUY_DATE
10 lines long.
0001: D Date of purchase
0002: 14
0003: D2/
0004: Date Purchased
0005: 8R
0006: M
0007: ORDERS
0008: INTEGER
0009:
0010:
----: 3
0003: D2/
----: R
0003:
----: SAVE VOC F14NOCON
"F14NOCON" filed in file "VOC".
----: Q
Now sort with new D type. Values are before the Y-1995 era when pick date were still 4 digits!
SORT CUSTOMER F14NOCON 06:45:25am 10 Oct 2017 PAGE 1
CUSTOMER.. Date Purchased
1 8408
10 8429
8430
8431
Good Luck!

How to know if a variation (f.e. abbreviation) of a string in a list does match agains another list if the original does not?

I currently searching for a method in R which let's me match/merge two data frames. Helas both of these data frames contain non optimal data. They can have certain abbreviations of even typo's in them. Therefore I would like to define a list for each abbreviation and if a string contains one of those elements. If the original entries don't match, R should check if any of the other options of the abbreviation has a match. To illustrate: the name of a company could end with "Limited" but also with "Ltd." of "Ltd" etc.
EXAMPLE
Data
The Original "Address" file contains:
Company name Address
Deloitte Ltd. New York
Coca-Cola New York
Tesla ltd California
Microsoft Limited Washington
Would have to be merged with "EnterpriseNrList"
Company name EnterpriseNumber
Deloitte Ltd. 221
Coca-Cola 334
Tesla ltd 725
Microsoft Limited 127
So the abbreviations should work in "both directions". That's why I said, if R recognises any of the abbreviations, R should try to match all of them.
All of the matches should be reported as the return.
Therefore I would make up a list "Abbreviations" for each possible abbreviation
Limited.
limited
Ltd.
ltd.
Ltd
ltd
Questions
1) Would this be a good method, or would there be a more efficient way?
2) How can I check a list against a list of possible abbreviations (step 1, see below), sort of a containsx from excel?
3) How could I make up a list that replaces for the entries that do not match the abbreviation with all other abbreviatinos (step 2, see below)?
Thoughts for solution
Step 1
As I am still very new to this kind of work, I was thinking the following: use a regex expression to filter out wether a string contains any of the abbreviation options and create a list which will then contain either -1 if no match could be found and >0 if match is found. The no pattern matching can already be matched against the "Address" list. With the other entries I continue to step 2.
In this step I don't really know how to check against a list of options ("Abbreviations" list).
Step 2
Next I would create a list with the matches from step 1 and rbind together all options. In this step I don't really know to I could create a list that combines f.e. Coca-Cola with all it's possible abbreviations.
Coca-Cola Limited
Coca-Cola Ltd.
Coca-Cola Ltd
etc.
Step 3
Lastly I would match/merge this more complete list of companies again with the original "Data" list. With the introduction of step 2 I thought It might be a bit easier on the required computing power, as the original list is about 8000 rows.
I would go in a different approach, fixing the tables first before the merge.
To fix with abreviations, I would use a regex, case insensitive, the final dot being optionnal, I start with a list of 'Normal word' = vector of abbreviations.
abbrevs <- list('Limited'=c('Limited','Ltd'),'Incorporated'=c('Incorporated','Inc'))
The I build the corresponding regex (alternations with an optional dot at end, the case will be ignored by parameter in gsub and agrep later):
regexes <- lapply(abbrevs,function(x) { paste0("(",paste0(x,collapse='|'),")[.]?") })
Which gives:
$Limited
[1] "(Limited|Ltd)[.]?"
$Incorporated
[1] "(Incorporated|Inc)[.]?"
Now we have to apply each regex to the company.name column of each df:
for (i in seq_along(regexes)) {
Address$Company.name <- gsub(regexes[[i]], names(regexes[i]), Address$Company.name, ignore.case=TRUE)
Enterprise$Company.name <- gsub(regexes[[i]], names(regexes[i]), Enterprise$Company.name, ignore.case=TRUE)
}
This does not take into account typos. Here you'll need to work on with agrepor adist to manage it.
Result for Address example data set:
> Address
Company.name Address
1 Deloitte Limited New York
2 Coca-Cola New York
3 Tesla Limited California
4 Microsoft Limited Washington
Input data used:
Address <- structure(list(Company.name = c("Deloitte Ltd.", "Coca-Cola",
"Tesla ltd", "Microsoft Limited"), Address = c("New York", "New York",
"California", "Washington")), .Names = c("Company.name", "Address"
), class = "data.frame", row.names = c(NA, -4L))
Enterprise <- structure(list(Company.name = c("Deloitte Ltd.", "Coca-Cola",
"Tesla ltd", "Microsoft Limited"), EnterpriseNumber = c(221L,
334L, 725L, 127L)), .Names = c("Company.name", "EnterpriseNumber"
), class = "data.frame", row.names = c(NA, -4L))
I would say that the answer depends on whether you have a list of abbreviations or not.
If you have one, you could just look which element of your list contains an abbreviation with grep or greplfunctions. (grep return all indexes that have a matching pattern whereas grepl returns a logical vector).
Also, use the ignore.case= TRUE parameter of these function, so you don't have to try all capitalized/lowercase possibilities.
If you don't have such a list, my first guest would be to extract the first "word" of each company (I would guess that there is a single "Deloitte" company, and that it is "Deloitte Ltd"). You can do so with:
unlist(strsplit(CompanyNames,split = " "))
If you wanted to also correct for typos, this is more a question of string distance.
Hope that it helped!

How can we use clustering results in weka ?

I am using Weka for my internship but I have a little knowledge about data mining. So, maybe someone knows how can I apply the following results on my data-sets to get all data by cluster ? The method that I use now is to compute distances between my attributes and the mean value of each cluster then I classify them by the nearest value. But this method is too rough for me .
=== Run information ===
Scheme:weka.clusterers.EM -I 100 -N -1 -M 1.0E-6 -S 100
Relation: wcet_cluster6 - Copie-weka.filters.unsupervised.attribute.Remove-R1-3,5-weka.filters.unsupervised.attribute.Remove-R5-12
Instances: 467
Attributes: 4
max
alt
stmt
bb
Test mode:evaluate on training data
=== Model and evaluation on training set ===
EM
Number of clusters selected by cross validation: 6
Cluster
Attribute 0 1 2 3 4 5
(0.28) (0.11) (0.25) (0.16) (0.04) (0.17)
==================================================================
max
mean 9.0148 10.9112 11.2826 10.4329 11.2039 10.0546
std. dev. 1.8418 2.7775 3.0263 2.5743 2.2014 2.4614
alt
mean 0.0003 19.6467 0.4867 2.4565 44.191 8.0635
std. dev. 0.0175 5.7685 0.5034 1.3647 10.4761 3.3021
stmt
mean 0.7295 77.0348 3.2439 12.3971 140.9367 33.9686
std. dev. 1.0174 21.5897 2.3642 5.1584 34.8366 11.5868
bb
mean 0.4362 53.9947 1.4895 7.2547 114.7113 22.2687
std. dev. 0.5153 13.1614 0.9276 3.5122 28.0919 7.6968
Time taken to build model (full training data) : 4.24 seconds
=== Model and evaluation on training set ===
Clustered Instances
0 163 ( 35%)
1 50 ( 11%)
2 85 ( 18%)
3 73 ( 16%)
4 18 ( 4%)
5 78 ( 17%)
Log likelihood: -9.09081
Thanks for your help!!
I think no-one can really answer this. Some tips off the top of my head.
You have used the EM clustering algorithm, see animated gif on wikipedia page. From Weka's Documentation Synopsis:
"EM assigns a probability distribution to each instance which
indicates the probability of it belonging to each of the clusters. "
Is this complex output really what you want?
It also selects a number of clusters for you (unless you constrain that number).
In weka 3.7 you can use the unsupervised attribute filter "ClusterMembership" in the Preprocess dialog to replace your dataset with a result of the cluster assignments. You need to select one reference attribute, though. By default it selects the last one. This creates hard-to -interpret output.