How to get all cells that appear more than 5 times? - openoffice-calc

enter image description here
I have a table in OpenOffice that contains a column with region's codes (column J). Using table functions, how to get all codes that appear more than 5 times and write them in one cell?

Normally I would recommend breaking this problem down into smaller parts using helper columns. Or better yet, move the data into LibreOffice Base which can easily work with distinct values.
However, I managed to come up with a rather large formula that seems to do what you asked. Enter it as an array formula.
=TEXTJOIN(",";1;IF(COUNTIF(исходник.J$2:J$552;исходник.J2:J552)>5;IF(ROW(исходник.J2:J552)=MATCH(исходник.J2:J552;исходник.J$2:J$552;0)+ROW(J$2)-1;исходник.J2:J552;"")))
I can't test this on your actual data since your example is only an image, but let's say that there are six of both 77 and 37. Then this would show 77,37 as the result.
Here is a breakdown. Look up the functions in LibreOffice Online Help for more information.
=TEXTJOIN(",";1; — Join all results into a single cell, separated by commas.
IF(COUNTIF(исходник.J$2:J$552;исходник.J2:J552)>5; — Find codes that occur more than 5 times. This is the same as what you wrote.
IF(ROW(исходник.J2:J552)= — Compare the next result to the row number that we are currently looking at.
MATCH(исходник.J2:J552;исходник.J$2:J$552;0)+ROW(J$2)-1; — Determine the first row that has this code. We do this to get unique results instead of 6 or more of each code in the result.
исходник.J2:J552;""))) — Return the code. (Your formula simply returns 1 here, which doesn't seem to be what you want.) If it doesn't match, return an empty string rather than 0, because TEXTJOIN ignores empty strings.

Related

perform mathematical operations on a number without changing the attached text

I need a formula that can multiply or divide all the numbers in a string without changing the text attached to the numbers.
I need the numbers in the next column to automatically change according to the given mathematical operation, but the text from the original line must remain unchanged.
I've tried using a combination of REGEXMATCH and REGEXEXTRACT and by doing this I just get the result of multiplying/dividing all the numbers in the string (no text whatsoever).
I also had no success using REGEXREPLACE. I'm not even sure we can actually use it in this case, and maybe I need a different formula instead. Maybe you first need to extract the numbers, multiply them and use something like TEXTJOIN or CONCATENATE to put them together in a string with the values already changed, and is this even possible in this specific example? It's totally fine to perform the operation in several steps if needed (for example, adding SPLIT function or something like that), but the format of the raw data we need to enter and recalculate, unfortunately, cannot be modified.
A sample table for better visualisation can be seen below. Any help would be greatly appreciated!
Raw data
Operation
Desired outcome
25STR/40DEX/70FRES
*0.25
6.25STR/10DEX/17.5FRES
80VIT/30INT/50CRES
*0.75
60STR/22.5INT/37.5CRES
60VIT/20STR/45LRES
*1.25
75VIT/25STR/56.25LRES
You may try:
=byrow(index(bycol(split(A2:A,"/"),lambda(z,ifna(ifs(left(B2:B,1)="*",regexextract(z,"\d+")*mid(B2:B,2,99),left(B2:B,1)="/",round(regexextract(z,"\d+")/mid(B2:B,2,99),2))&regexextract(z,"\d+(.*)"))))),lambda(y,if(y="",,join("/",y))))

Formula to return value if all non-empty values are same, else return empty?

I have a Google Sheet with many (many, many) cases of the following situation:
A B C D E
1 a b1 e1
2 a
3 b2 d e2
4 a e2
Basically each row lists content (information about research papers) scraped from a different location; theoretically, the values in each row should be the same, but since some locations lacked some information, and sometimes the information differs in some minor (but possibly important) way, there isn't 100% agreement throughout.
I'd like for each cell below such a group to display one value if all the non-empty values in that column are the same and to display nothing at all if there's some disparity between the non-empty values. See row 5 below:
A B C D E
1 a b1 e1
2 a
3 b2 d e2
4 a e2
5 a d
This is basically a first programmatical clean-up to assist further manual labor (which is unavoidable).
There's an example sheet available here - the real thing would have about 18 sets of values (title, authors, ISBN, publication, URL, keywords, etc), and 270 columns (each for another publication). The orange rows at the bottom are just pasted in manually but show the values I would like to get in the blue rows via formulas.
I realize this can be done with a massive string of IFs, but... surely there must be a way to write a formula that will extract all the non-empty values from an array or group of cells, compare them with each other, and return a single value if they're all equal?
Unfortunately, I'm drawing a blank...
=IFERROR(IF(COUNTA(UNIQUE(FILTER(B2:B5, B2:B5<>"", B2:B5<>"#N/A")))>1, ,
UNIQUE(FILTER(B2:B5, B2:B5<>"", B2:B5<>"#N/A"))))
or shorter:
=IF(COUNTUNIQUE(FILTER(D2:D5, D2:D5<>"", D2:D5<>"#N/A"))>1, ,
UNIQUE(FILTER(D2:D5, D2:D5<>"", D2:D5<>"#N/A")))
Countunique should work:
=if(countunique(A1:A4)=1,sortn(A1:A4,1),"")
I've used sortn because I want to remove any empty cells from the list of values before displaying what should be the single non-empty value and that is one way of doing it (empty cells are sorted to the end so won't appear).
Edit
If the data includes #N/A's probably the shortest way to deal with them would be to use the (to me slightly obscure) function countuniqueifs
=if(countuniqueifs(A1:A4,A1:A4,"<>#N/A")=1,sortn(A1:A4,1),"")
Blank cells and #N/A's are still sorted after everything else, so I think the sortn part should still be valid.
But there is a further issue with this - if the range contains empty strings returned from a formula, the sortn part won't work properly, so would have to fall back on filtering:
=if(countuniqueifs(C1:C4,C1:C4,"<>#N/A",C1:C4,"<>")=1,filter(C1:C4,C1:C4<>"#N/A",C1:C4<>""),"")
This is surely not an optimal solution but it works
=IF(COUNTIF(A1:A7,first_non_empty_cell)=COUNTA(A1:A7),first_non_empty_cell,"")
You might consider replacing first_non_empty_cell with
LOWER(INDEX(A1:A7,MATCH(1,INDEX((A1:A7<>0),0),0)))
or with the cell containing the value you want to use for comparisons.

How do I apply a formula to a range without applying said formula to every cell?

I'm trying to apply a formula without having it add the formula data to each and every cell - in other words, I need the cells that are receiving the formula to be untouched until they get their data.
I was searching around and it looked like an ARRAYFORMULA would work but it doesn't seem to be doing anything when I apply it.
For example, I want to apply this formula to a cell range: =SPLIT(E2, ",")). Each cell in the E column needs to be split into two the two adjacent cells next to it based on it's comma. When I try to apply =ARRAYFORMULA(SPLIT(E2:E99, ",")) only the cell I add this to gets the formula.
In addition to the contribution of pnuts, also try:
=ArrayFormula(iferror(REGEXEXTRACT(","&E2:E,"^"&REPT(",+[^,]+",COLUMN(OFFSET(A1,,,1,6))-1)&",+([^,]+)")))
Note: the last parameter of OFFSET can be changed to match the maximum number of values you have in the cells of the range E2:E (separated by a comma). E.g: if you have a no more than 3 values per cell, set it to three. The output will then be three columns wide (one column for each value).
Hope that makes sense ?
Also credits due to AdamL who (I believe) orginally crafted this workaround.
I think what you want may be array_constrain but for your example I can only at present offer you two formulae (one for each side of the comma):
=Array_constrain(arrayformula(left(E2:E,find(",",E2:E)-1)),match("xxx",E:E)-1,1)
=Array_constrain(arrayformula(mid(E2:E,find(",",E2:E)+1,len(E2:E))),match("xxx",E:E)-1,1)

Conditional Vlook up without using VBA

I want to convert an input to desired output. Kindly help.
In the output - the columns value should start from most recent (year)
Please click this to see data
Unfortunately VLOOKUP is not able to fulfill that ask. However the INDEX-function can.
Here is a good read on how to use it:
http://fiveminutelessons.com/learn-microsoft-excel/use-index-lookup-multiple-values-list
This will work for you spreedsheet, if your input table starts at A1 without a header and your output table starts at H3 with the first ID.
You get this by copy&pasting the first column of your input table to column H and then remove duplicates.
{=IF(ISERROR(INDEX($A$1:$C$7,SMALL(IF($A$1:$A$7=$H$3,ROW($A$1:$A$7)),ROW(1:1)),3)),"",
INDEX($A$1:$C$7;SMALL(IF($A$1:$A$7=$H$3,ROW($A$1:$A$7)),ROW(1:1)),3))}
Let's look at the formula step by step:
The curly brackets tell excel that this is an array formula, the interesting part for you is: when you've inserted the formula (without curly brackets) press shift+ctrl+enter, excel will then know that this is an array formula.
'error at formula?, then blank, else formula
=IF(ISERROR(....),"",...)
When you autofill this formula you probably dont know how many instances of your lookup variable are. So when you put this formula in 4 cells, but there are only 3 entries, this bit will keep the cell blank instead of giving an error.
INDEX($A$1:$C$7,SMALL(IF($A$1:$A$7=$H$3,ROW($A$1:$A$7)),ROW(1:1)),3))
$A$1:$C$7 is your data matrix. Your IDs (in your case 125 and 501) are to be found in $A$1:$A$7. ROW(1:1) is the absolute(!) rowID, 3 the absolute(!) column id. So when you move your input table those values have to be changed.
What exactly SMALL and INDEX do are well described in the link above. (Or at least better than I could.)
Hope that clarified some parts,
Tom

R: searching within split character strings with apply

Within a large data frame, I have a column containing character strings e.g. "1&27&32" representing a combination of codes. I'd like to split each element in the column, search for a particular code (e.g. "1"), and return the row number if that element does in fact contain the code of interest. I was thinking something along the lines of:
apply(df["MEDS"],2,function(x){x.split<-strsplit(x,"&")if(grep(1,x.split)){return(row(x))}})
But I can't figure out where to go from there since that gives me the error:
Error in apply(df["MEDS"], 2, function(x) { :
dim(X) must have a positive length
Any corrections or suggestions would be greatly appreciated, thanks!
I see a couple of problems here (in addition to the missing semicolon in the function).
df["MEDS"] is more correctly written df[,"MEDS"]. It is a single column. apply() is meant to operate on each column/row of a matrix as if they were vectors. If you want to operate on a single column, you don't need apply()
strsplit() returns a list of vectors. Since you are applying it to a row at a time, the list will have one element (which is a character vector). So you should extract that vector by indexing the list element strsplit(x,"&")[[1]].
You are returning row(x) is if the input to your function is a matrix or knows what row it came from. It does not. apply() will pull each row and pass it to your function as a vector, so row(x) will fail.
There might be other issues as well. I didn't get it fully running.
As I mentioned, you don't need apply() at all. You really only need to look at the 1 column. You don't even need to split it.
OneRows <- which(grepl('(^|&)1(&|$)', df$MEDS))
as Matthew suggested. Or if your intention is to subset the dataframe,
newdf <- df[grepl((^|&)1(&|$)', df$MEDS),]