In Libreoffice Calc, which formula will check if a a keyword or part of it is contained in a cell in a row and copy the entire content of that cell? - if-statement

I am learning how to use formulas in spreadsheets, I do use libre office.
I need to sort out data in a quite huge messy spreadsheet.
Each column contains mixed data, the sheet is huge, dozens of columns and thousands of rows, if the spreadsheet does not contain errors each cell in a row either contains a different keyword or is empty, there should not be two cells in the same row containing the same keyword.
The problem to solve is to sort out all the data so to reach to have a new spreadsheet in which each cell marked with a given specific keyword is kept in the same position but placed in one column dedicated to that same keyword.
the kind of spreadsheet with mixed up cells to be sorted out
the data in the spreadsheet has to be fixed so to appear in this way
A formula that can be used to extract sorted out data from a cell is the following:
=IF(SEARCH("Text1";B2;1);B2;0)
The formula can be dragged to each cell below to hit the proper cell next to it. The result is correct.
The results are correct, but I do not know why the expected 0 is not printed, there is #VALUE! instead
The logic is very simple, if the cell contains the keyword or any other text that contains that keyword the result is the full content of that cell, otherwise the result is 0.
Here comes the first question, why do I get #VALUE! as a result for those cells that do not contain the keyword? I expected to get 0 instead, just as indicated in the formula,
I tried to leave this filed empty and also to put the 0 result in quotes, the actual result is always the same, #VALUE!...
However, of course this formula extracts only the information contained in one column, so for each other column the process must be repeated.
In order to avoid to create a column with the formula for each column in the spreadsheet or anyway to process each column one by one and more importantly to have then to merge all the results to form one columns containing only cells with a given keyword I thought to use the same formula extending the parsing to each next cell in the row as follows:
=IF(SEARCH("text";B2;1);B2;IF(SEARCH("text";C2;1);C2;IF(SEARCH("text";D2;1);D2;0)))
The logic is very simple and should output in one go a column containing all the cells containing the keyword that are found in the row, check if the first cell in the row contains a word using the search function, if does then the result is the content of that cell, otherwise perform the next test, the next test is the same, check if the next cell contains a certain word using the search function, if does then the result is the content of that cell, otherwise proceed to the next test…. and so on until last test, if no test gave a true result then print 0 (but we get #VALUE!, OK I could live with that...).
In theory should work for a any number of cells, but in the practice does not at all, in fact does work only for the first IF test and cell indicated in the formula.
WHY?
The result using the extended version of the formula to parse N cells in sequence is the same obtained with the simple formula to parse only one cell
Finally, how do I resolve this problem using IF and Search?
Is there any other better approach and way to solve this kind of problems and sort out data in huge spreadsheets of this kind?
Thank you for any hint and help.

Related

Complex rearranging and repeating of column headers via arrayformula possible?

I have a complex survey with numerous skip logic rules that ends up returning over 3 dozen columns of mostly empty data with only certain questions applicable to each respondent's submission. I tried creating a column at the end of the columns to grab any cell in that row that was not blank and concatenating them all into one cell:
=ifna(textjoin("|",true,filter($A$2:$AO$2&"_"&A3:AO3,A3:AO3>0)))
This yielded me one cell per row with everything I needed - including the column headers so I could parse the data (without all the blanks) by looking only at that one column.
However, each time a new response comes in, it shifts all the data down so I am constantly needing to go in and add the formula to new responses. I tried moving the formula to another tab completely:
=ifna(textjoin("|",true,filter(Eureka!$A$2:$AO$2&"_"&Eureka!A3:AO3,Eureka!A3:AO3>0)))
This formula also will not correct itself once new data appears on the Eureka tab. So I filled that formula down in one long column...it works perfectly on any response up to that point. Then when a new response comes in (at row 274 as an example), all of the formulas below row 274 automatically add a row to the references. So that if my formula in row 274 has ranges like this: A274:AO274...once a response comes in on row 275, my formula on row 275 has jumped up by one like this: A276:AO276 (to 298 or 343...depending on the number of new responses.
So I want to make my formula act as an arrayformula:
=ifna(arrayformula(textjoin("|",true,filter(Eureka!$A$2:$AO$2&"_"&Eureka!A3:AO,Eureka!A3:AO>0))))
but textjoin only works on either rows or columns, so this keeps giving me an error.
I think I need to use MAP/LAMBDA possibly or some kind of REPT, but I just can't seem to crack it.
And in full disclosure, my ultimate goal would be to actually have each question returned on its own row so that the first two columns get repeated for every question vertically. But I think once I get the original question addressed, I can figure out how to do that.
TEXTJOIN in arrayformula?
The following formula should produce the result you desire:
=BYROW(BYCOL(FILTER(Eureka!A2:AO,Eureka!A2:A <> ""),LAMBDA(col, ARRAYFORMULA(CONCAT(ARRAYFORMULA(IF(ISBLANK(FILTER(col,{FALSE;TRANSPOSE(SPLIT(REPT(TRUE&CHAR(127),ROWS(col)-1),CHAR(127),TRUE,TRUE))})),,ARRAY_CONSTRAIN(col,1,1)&"_")),FILTER(col,{FALSE;TRANSPOSE(SPLIT(REPT(TRUE&CHAR(127),ROWS(col)-1),CHAR(127),TRUE,TRUE))}))))),LAMBDA(row,TEXTJOIN("|",true,row)))

Excel Alternative to nested IF

I have a couple of rather large nested if functions in my spreadsheet. It sure would be nice to have an alternative method. Problem is I'm using a wildcard (*) in my lookup because the source text has slight variations (date for example).
For example, if my list of data contains:
VENMO PAYMENT 220828 1022093447487 BRENDA HOSPY
VENMO PAYMENT 220813 1031323447487 BRENDA HOSPY
I want these to show in an adjacent column of cells as just Venmo
Currently my if function in that second column of cells is:
=IF(COUNTIF($F10,"*APPLE.COM/BILL*"),"AP",
IF(COUNTIF($F10,"IIA VOYA*"),"VOYA",
IF(COUNTIF($F10,"VENMO PAYMENT*"),"Venmo",
IF(COUNTIF($F10,etc...
This works fine but quickly gets unruly as more things get added.
I've spent a great deal of time searching for functions and processes that would make this easier, or at least more compact, but I can't find a way with typical functions like vlookup or index/match.
If I've explained this in a comprehensible fashion perhaps you've seen or experienced a similar situation and could offer a suggestion. It would be appreciated!
I'm not opposed to using a programming function.
I've looked at, and for, various Excel functions or combinations with no luck on my own or online.
I have created a structure as below
Formula present in B2 is as below
=IFERROR(INDEX($F$2:$F$9,MIN(IF(COUNTIF(A2,"*"&$E$2:$E$9&"*")>0,ROW($E$2:$E$9),9999999)-1)),"---")
Enter it as an Array Formula using Ctrl+Shift+Enter
It will search all the strings present in column E in A2 when found will return all the row numbers of column E where there is a match, i have then used min to get the first one, and if not found it will return 9999999, and as the data is starting from row 2 i have added -1 to make it equal to the data index. after that i have called the index to search value present at that index in column F. and at the end used the if error function to show --- where no match was found and 999999 was returned.

Searching within the result of a vlookup using a range of values and parsing text

MY GOAL:
parse a MM/DD date from the result of a vlookup so that it can be used in a project plan
BACKGROUND:
The vlookup result contains multiple values separated by a "•" (I don't need all of them)
The value I'm looking to parse is not always in the same location in the vlookup result (otherwise I could use the RIGHT formula)
There is a finite number of the values I'm looking to retrieve (and I know them already)
The value that I'm looking to retrieve contains some text with a date range; I only want the first four values in the date range (MM/DD)
I'd like to achieve all this with a single formula with the result in a single cell
CURRENT FORMULA
The formula that I've been working on that is not working is:
=ARRAYFORMULA(if(iserror(search(Iterations!D2:D7,(VLOOKUP(A2,'Results {2596503}'!$C$2:$L$183,3)))),,))
I've set up a sheet called "Erik Help" with the following formulas in B2 ad C2:
=ArrayFormula(IF(A2:A="","",MID(VLOOKUP(A2:A,data!A2:B,2,FALSE),FIND(REGEXEXTRACT(VLOOKUP(A2:A,data!A2:B,2,FALSE),"[0-9]-[0-9]"),VLOOKUP(A2:A,data!A2:B,2,FALSE))-4,5)))
and
=ArrayFormula(IF(A2:A="","",MID(VLOOKUP(A2:A,data!A2:B,2,FALSE),FIND(REGEXEXTRACT(VLOOKUP(A2:A,data!A2:B,2,FALSE),"[0-9]-[0-9]"),VLOOKUP(A2:A,data!A2:B,2,FALSE))+2,5)))
respectively.
They may be longer than actually needed, but you did not share realistic results in Column B or list which symbols may appear in Column B other than in the date; so I tried to account for either a hyphen or a forward slash possibly appearing in Column B in places other than within the date span.
Your analytics sheet also shows a formula that is sorting the results from data!A:A. So even though in your example the original data order happens to be the same as in analytics!A:A, that is not a given (again, based on your formula). Therefore, the VLOOKUP is also necessary.
You did not indicate whether you need to further use these returned date-snippets in calculations, or whether you just need to view them. So the results generated in "Erik Help" are text.
If you want usable numbers/dates, you add further issues that would need to be controlled for in the formula, because you'll only be extracting month and day, not year. That's fine right now. But what about when the date range to be extracted is "12/28-01/13"? If you simply make these values/dates, they will both be assigned to the current year. So the end date here will wind up being earlier than the start date.
Because of this, I've added a second sheet, "Erik Help 2," which contains extended formulas to account for these cases while still returning the date format you want as actual dates which can be used in calculations.
EDIT
(following your note on the sheet: "I would like to remove col b altogether and nest in the formulas in col c and d")
You can adjust the range B2:B by replacing it with your already existing formula in B2.
The new adjusted formula will become
=ArrayFormula(IFNA(SPLIT(REGEXEXTRACT(VLOOKUP(ARRAYFORMULA(sort(unique(data!A2:A))),data!$A$1:$C,2),"\d+\/\d+-\d+\/\d+"),"-")))
Original answer
You can use the following formula:
=ArrayFormula(IFNA(SPLIT(REGEXEXTRACT(B2:B,"\d{2}\/\d{2}-\d{2}\/\d{2}"),"-")))
Make sure you format the results as Date.
(Please adjust ranges to your needs)
Functions used:
ArrayFormula
IFNA
SPLIT
REGEXEXTRACT
try:
=ARRAYFORMULA(IF(A2:A="",,IFNA(TEXT(SPLIT(REGEXEXTRACT(
VLOOKUP(data!A2:A, data!A:C, 2), "\d+/\d+-\d+/\d+"), "-"), "mm/dd"))))

How to apply conditional formatting (if cell is in another range) to a range of cells

So I have searched through several different questions related to this. None of them seem to be asking exactly what I'm looking for and none of the solutions I've found have worked for me thus far.
I have several columns of data (Player names) where each column's values are generated from a formula in the 2nd row of that column. The 1st row is a header (Game name). This whole range is the collection of which players are willing to play which games. These are columns D-J(ish, the list is dynamically generated with another formula, based on form responses)
I have another range of data where the 1st column is the Player and the 2nd is the player's PREFERRED game. This data is also generated with a formula based on form responses. These are columns A-B.
Here's what I'm trying to do
Using conditional formatting in columns D-J, I want to highlight the player's name if this game (in row 1 of this column) is their preferred game (range A2:B).
I've tried several different variations of VLOOKUPS, MATCHES, and FILTERS in the conditional formatting, but so far nothing has worked. The problem I run into every time is that I can't figure out how to reference the cell that the formatting is applying to, but still have it reference each individual cell over the whole range.
I know I could do this if I applied an individual conditional formatting to each individual cell. However that is a very time consuming and inelegant solution to this issue considering I'm expecting my data range to be much larger in the future. I need a conditional formatting formula that will work across the whole range or , at the very least, for an entire column.
This is a mock of what I'm trying to accomplish:
This is a link to a mock of my sheet so that you can clearly see the data layout and specific formulas I'm using:
https://docs.google.com/spreadsheets/d/1wy1T6dWJwNC_EfdCAbkuxtkJH7y4Cg3x4IyEk6R567M/edit?usp=sharing
use:
=REGEXMATCH(D3, TEXTJOIN("|", 1, FILTER($A$3:$A, $B$3:$B=D$2)))

Calculate ever expanding number of columns with data to the right

Currently have a spreadsheet that tracks attendance. First column is name, second column is attendance % and contains the formula I need to revise, subsequent columns simply have an X or O in them and denote whether someone attended or not (headers for these columns are dates).
Currently using a COUNTIF() I can check how many X's there are and then the formula is SUM(100/no_of_columns*COUNTIF(A3:A12))
Ideally I want to firstly replace no_of_columns with the actual number of columns with data to the right.
I've thought about replacing this with a SUM(COUNTIF('X')+COUNTIF('O')) but it seems pretty messy?
Secondly I want to replace the A12 with whatever the last column value is.
I could just make the last column a very high column value, but again feels messy and would like to know if there is a better way...
Example: https://docs.google.com/spreadsheets/d/1rjnUQP7V-U1EZTp3Z8yO7HybBCuQjf2y4LJ4Dv4ctF8/edit?usp=sharing
Presume you only have the attendance dates in Row 1 without other information such as headers for Column A and B,
Put the following formula in Cell B2 and drag it down,
=COUNTIF(INDEX(OFFSET($C2,,,,COUNTA($1:$1)),),"x")/COUNTA($1:$1)*100
The logic is to use INDEX + OFFSET function to dynamically return the range of columns on the right, and use COUNTA to find out how many dates are there, and you should understand the use of COUNTIF, the calculation is self-explanatory.
EDIT #2
After looking into your worksheet, I guess you are adding the new dates by inserting columns between B and C so you probably want to use the following formula in Cell B2 instead to avoid the system shifting the starting cell reference automatically:
=COUNTIF(INDEX(OFFSET($B2,,1,,COUNTA($1:$1)),),"x")/COUNTA($1:$1)*100
The logic is the same as the previous one but just a little change to the OFFSET references so it starts looking for the range from Column B instead of C.
I have tested the above in both Excel and Google-sheets working just fine. Let me know if you have any questions. Cheers :)
paste in B2:
=ARRAYFORMULA(IFERROR(IF(LEN(A2:A),
MMULT(IF(INDIRECT("C2:"&ADDRESS(ROWS(A2:A), MAX(IF(1:1<>"", COLUMN(1:1), ))))="x", 1, 0),
TRANSPOSE(COLUMN(INDIRECT("C2:"&ADDRESS(ROWS(A2:A), MAX(IF(1:1<>"", COLUMN(1:1), )))))^0))/
MMULT(IF(INDIRECT("C2:"&ADDRESS(ROWS(A2:A), MAX(IF(1:1<>"", COLUMN(1:1), ))))<>"", 1, 0),
TRANSPOSE(COLUMN(INDIRECT("C2:"&ADDRESS(ROWS(A2:A), MAX(IF(1:1<>"", COLUMN(1:1), )))))^0))*100, ), 0))
spreadsheet demo