Remove duplicates and Keep related data Calc (Excel) - openoffice-calc

I have a list of products in calc (excel), each with an associated IP address. Many of the names have multiple IP addresses, however they are organized one column at a time. I am trying to remove all of the multiples and pull all of the IP addresses under a single name. I have tried nslookup and index match, they do not deal well with multiple outputs though. Right now it looks like this
a| 1
a| 2
a| 3
b| 1
b| 2
b| 3
etc...
I would like it to look like this
a 1,2,3
b 1,2,3
Is there any way to do this without wasting a ton of time, I have a few ways that work but they will take me forever to setup.

I recommend setting up your formulas in multiple "helper" cells before getting to the final "result cell". This breaks down the problem into smaller steps that are more easily formulated and, if needed in the future, updated. Once the setup is complete you can hide the helper columns by right-clicking on the column letter and choosing "Hide".
The first column to set up is the list of distinct product names. For the formula below to work, the product/IP list will need to be sorted in ascending order. If the list is not already sorted, to sort it first highlight the entire list, including headers. Then choose Data→Sort; select sort by "Product", make sure the radio button "Ascending" is selected, and press OK.
For purposes of this example, I'll assume product names are in column A, starting on row 2 and IPs are in column B starting on row 2 (with row 1 being the header labels). In the column where you want to list the distinct product names (I used column D), enter in the top cell =A2. In the cell below enter
=INDEX($A$2:$A$13;MATCH(D2;$A$2:$A$13;1)+1)
The match formula has a 1 as the third variable, meaning the range is sorted ascending and MATCH will return the position of the last matching cell. We add 1 to the position of the last matching cell, and this will be the position of the first cell with a new product name. That position is fed into the INDEX function to show the next product name.
Copy and paste that cell down as far as you need to show all the product names.
Now we'll set up a series of cells to display each IP address. I used columns F to I to show up to 4 addresses:
=IF(MATCH(D2;$A$2:$A$13;0)<=MATCH($D2;$A$2:$A$13;1);INDEX($B$2:$B$13;MATCH($D2;$A$2:$A$13;0));"")
=IF(MATCH(D2;$A$2:$A$13;0)+1<=MATCH(D2;$A$2:$A$13;1);INDEX($B$2:$B$13;MATCH(D2;$A$2:$A$13;0)+1);"")
=IF(MATCH(D2;$A$2:$A$13;0)+2<=MATCH(D2;$A$2:$A$13;1);INDEX($B$2:$B$13;MATCH(D2;$A$2:$A$13;0)+2);"")
=IF(MATCH(D2;$A$2:$A$13;0)+3<=MATCH(D2;$A$2:$A$13;1);INDEX($B$2:$B$13;MATCH(D2;$A$2:$A$13;0)+3);"")
MATCH with the third variable of 1 returns the position of the last matching cell; MATCH with the third variable of 0 returns the position of the first matching cell.
The IF statement checks if the position of the first matching cell (in the first lookup column) or the cell below that (in the second lookup column) or the cell two below the first match (in the third lookup column), etc. is less than or equal to the position of the last matching cell. If yes, then it looks up the relevant IP address. If no, it displays a blank.
In the formulas above you would need to manually enter the formula in the top row of each column. If you have some products with a large number of IP addresses, you may want to set up the formula so you can copy and paste between columns as well as down the rows. This would work if you were starting in column F:
=IF(MATCH($D2;$A$2:$A$13;0)+COLUMN()-6<=MATCH($D2;$A$2:$A$13;1);INDEX($B$2:$B$13;MATCH($D2;$A$2:$A$13;0)+COLUMN()-6);"")
Once you have your top row set up as you want, copy and paste down however many rows you need.
If you want to combine all the IPs into a single cell separated by commas, you can use a formula like this:
=CONCATENATE(F2;IF(G2<>"";","&G2;"");IF(H2<>"";","&H2;"");IF(I2<>"";","&I2;""))
Each IF statement will add a comma separator followed by the cell contents if the checked cell is not empty, otherwise it returns a blank string. You will need to manually adjust to add additional IF statements for however many maximum columns you want to concatenate. Again, once you have the top row set up, copy and paste down however far you need.

Assuming you have two columns (A and B), that these are labelled and sorted as shown, then enter in C2:
=IF(A1<>A2;B2;C1&","&B2)
and in D1:
=A1<>A2
Copy both down to suit, select ColumnC and Copy, Paste Special... with each Selection ticked other than Paste all and Formulas, click OK.
Select ColumnsA:D, Data > Filter > AutoFilter, click Yes and select 1 for ColumnD and all visible range.
Copy and paste into a new sheet, move B1 to C1 and delete Columns B and D.

Related

Highlight duplicates when part of the cell matches in Google Sheets

I have searched as much as I can, and I have found solutions for similar problems, but I haven't been able to find a solution to my exact problem.
Issue: I would like to highlight the row when one cell in column A of that row is an exact match for another cell in that column, AND part of another cell in column B of that row is a match for part of another cell in that column, in Google Sheets. I would like to use conditional formatting, and only highlight the second occurence and on.
For example, is this "sheet":
A B C
1|John Smith|john#test.com|Test Co.
2|Jane Doe |jane#x.com |X Company
3|John Smith|j.s#test.com |Test Inc.
4|John Smith|jsm#test.com |Test Incorporated
I would like row 3 and row 4 to highlight, because column A3 is a duplicate of A1, and everything in B3 after # matches everything in B1 after #, and the same is true of row 4. Also, only rows 3 and 4 should highlight; not row 1, since it is the first instance. I understand regexes, and I've found how to highlight a row if one cell in column A and one cell in column B is an exact match with other cells is their respective columns, but I haven't figured out how to combine the two where I can search for one cell that is an exact match with another cell in that column AND for one cell that is a partial match with another cell in that particular column. Here is a link to a test sheet that contains the sample info from above. https://docs.google.com/spreadsheets/d/1neZd213C1ssY7bPeBfu2xI3WPCmt-oKkfbdrXrid9I8/edit?usp=sharing
use:
=INDEX(COUNTIFS($A:$A&REGEXEXTRACT($B:$B, "#.+"), $A1&REGEXEXTRACT($B1, "#.+"),
ROW($A:$A), "<="&ROW($A1))>1)*(A:A<>"")
Try the following custom formula applied to A1:C:
=index((countif($A$1:$A1,$A1)>1)*
(countif(regexextract($B$1:$B1,"#(.*)"),
regexextract($B1,"#(.*)"))>1))

Regexmatch in Google Sheet to identify cells that include any string in another sheet

I have a ColumnA where each cell include multiple values separated by comma, eg:
Elvis Costello, Madonna
Bob, Elvis Presley, Morgan Stanley
Frank, Morgan Stanley, Madonna Ford,
Elvis Costello, Madonna Ford
And I want to identify which rows/cells that includes any of the exact terms in another sheet/column, eg
Elvis Presley
Madonna
And I found this simple solution using Regexmatch (the last solution on that page) Is there a way to REGEXMATCH from a range of cells from A1:A1000 for example?
Say you want to search for a match from a list of cities.
Put your list of cities in one tab.
Make them into lowercase for easier lookup since search terms are all in lowercase. You can do this by adding a new column and using the LOWER function.
Go back to your cell that has the list of search phrases.
In any blank cell out of the way (off to the side on the top row is a good place) put this formula: CITY LIST FORMULA: =TEXTJOIN("|",1,'vlookup city'!B$2:B$477) (if your tab is named 'vlookup city' and your cities are in column B of that tab)
Add a new column next to your search terms, or pick an existing one where you want to put your "match found" info.
In that new column, add this formula (if your data starts in row 4 and you put the City List formula in cell G3:) =REGEXMATCH(A4,G$4)
Fill the formula all the way down your list. You can double-click the little blue square in the bottom right corner of the cell, or grab-and-drag all the way to the bottom of the list.
Ba-ding! It will search for any one of those city names, anywhere in your search phrase.
If the search phrase contains at least one matching term, it will return "True."
You can then add extra features on your formula to make it return something else. For example: =IF(REGEXMATCH(A4,G$4), "match found", "no match found")
This is a super lightweight solution that won't slow your sheet down too much and is easy to use.
https://docs.google.com/spreadsheets/d/1XAIDB98r2CGu7hL3ISirErDPNlgT6lVt-TCG0qI1uTE/edit?usp=sharing
The problem is that the Regexmatch solution identifies "Elvis Costello" and "Madonna Ford" and I only want to identify cells/rows that includes the exact term to match, ie "Elvis Presley" and "Madonna", ie whatever is between the commas has to be an exact match with one of the search terms, not just partially right.
I hope it made sense:)
Thanks all!
I think I might have found the answer, still trying to double check if it's correct.
I added \b before and after. So in the example sheet re-posted in the quoted part of my question i changed the cell:
Cell B3:
=TEXTJOIN("|",1,'vlookup city'!B$2:B$476)
and added another cell like this:
Cell B2:
=concatenate("\b(",$B$3,")\b")
Still checking if all false flags are removed.
Thanks

Display each digit in a separate column in a row in Google Sheets

I have a column of data in binary values and I would like to split each digit of the number in the column into different cells across a row. How would I go about doing so? I saw the split function, but could not get it to work. https://support.google.com/docs/answer/3094136?
One of my example inputs:
1000111110100101111011110
1000110000100101000010000
try with this (you just change A2 to your cell):
=transpose(arrayformula(mid(A2,row(A1:offset(A1,len(A2),0)),1)))
For some rows (I limited text length with 30 char, you can change it):
=transpose(ARRAYFORMULA(mid(transpose(query(arrayformula(if(isnumber(A1:A)=true ,text(A1:A,"0"),A1:A)),"Select Col1 where Col1<>''")),row(A1:A30),1)))
try:
=ARRAYFORMULA(REGEXEXTRACT(A1:A, REPT("(.)", LEN(A1:A))))

How do I return <empty> from OO-CALC IF-statement?

Using OpenOffice CALC v4.1.3.
The dataset contains 400,000 rows and I am looking for the rows that are not in sequence order in column B. Column B contains integers from 1,2,3, etc. to the last row of data.
I am trying to set the cells in column-A with the formula as follow:
=IF(B3 = (B2+1);[empty];"BAD SEQUENCE")
I do not want to have the TRUE part to be "" (empty-string).
I want it to be [empty] or [blank] or [null] or [no-value] or [nothing] (using other language words here)
because I want to be able to use the [shift]+[down-arrow] key combination to find the next BAD-SEQUENCE row(s).
When the set of cells is actually [empty] then the [shift]+[down-arrow] navigates to the next "cell-with-value" (if not [empty]).
In this question, I have presented the code to show [empty] but I need the proper OO-CALC representation of [empty] to have empty cells when the if-statement is TRUE.
Your comments and solutions are welcome...thanks John
From https://superuser.com/questions/346873/openoffice-calc-how-to-insert-blank-in-a-formula:
No value will make isblank return true, because C1 will always contain a formula, and isblank literally tests for blanks. Not empty strings, but actual empty cells.
Like ISBLANK, Ctrl+Down considers any formula to be non-empty, regardless of its result.
Instead, do the following workaround:
Use "" as the [empty] value.
Copy column A.
Select an unused column such as column C.
Paste Special, with the Formulas box unchecked.
Alternatively, instead of using formulas, fill column A with values using a macro.

Concatenate a range of cells in OO Calc

I have column A with these cells:
A1: Apple
A2: Banana
A3: Cherry
I want a formula that will string them together in one cell like this:
"Apple, Banana, Cherry"
I don’t know if it’s implanted on OpenOffice but on his cousin LibreOffice Calc since the version 5.2 you’ve got the function : TEXTJOIN
TEXTJOIN( delimiter, skip_empty, string1[, string2][, …] )
delimiter is a text string and can be a range.skip_empty is a logical (TRUE or
FALSE, 1 or 0) argument. When TRUE, empty strings will be ignored.
string1[, string2][, …] are strings or references to cells or ranges
that contains text to join.
Ranges are traversed row by row (from top to bottom).
Example : =TEXTJOIN(",",1,A1:A10)
More info here :
https://help.libreoffice.org/6.3/en-US/text/scalc/01/func_textjoin.html?DbPAR=CALC#bm_id581556228060864
A different approach, suitable for a long list, would be to copy A1 to B1, prepend a " and in B2 enter:
=B1&", "&A2&IF(A3="";"""";"")
then double-click the fill handle to cell B2 (the small square at its bottom right). The result should appear in ColumnB in the row of the last entry of your list.
As of version 4.1.7 of Apache OpenOffice Calc, there still isn't a simple solution to this problem. CONCATENATE doesn't accept cell ranges, and there isn't a TEXTJOIN function like LibreOffice. However, there is a workaround.
This is essentially a duplicate of pnuts' answer, but with images to hopefully help. His answer explicitly addresses separating the items with delimiters, as well as the opening and closing quotations, as the question above uses. As the general question (how to concatenate a range of cells) is useful to many people, I think my answer should still be useful even though I haven't done that.
In my case, I had one column with letters corresponding to finished worksets, and one column with letters corresponding to unfinished worksets. The letters only appear on every 8th row, so I can't view them all at the same time. I wanted to just mash all the finished letters together in one cell to be easy to view, and the same with the unfinished letters.
The example removes the 7 empty rows per letter and manually inputs which letters are finished/unfinished for convenience.
Column A is the "unfinished" column to be concatenated. Column C is used to perform the concatenation. Row 2 is the first row, and row 24 is the final row. G1 shows the concatenated result in an easy-to-see spot near the top of the document.
Columns B and D, and cell G2, utilize the same method to show the "finished" data. The formulas aren't shown here.
In cell C2, point explicitly to A2:
=A2
If you may have blanks, as I do, there needs to be a conditional in C2 to treat the first cell as blank text, instead of as zero Note 1:
=IF (A2 <> "" ; A2 ; "")
Then, in cell C3, concatenate C2 and A3:
=C2 & A3
Copy C3, then highlight C4:C24 and paste the formula to autofill those cells.
Wherever you need the result of the concatenation, reference C24.
Notes
Note 1 If N cells at the top of the A row are blank and you just let C2 = A2, the first N rows on C will show 0, and a single 0 will be prepended to the concatenation result. Here, columns B and D are used to illustrate the problem:
Either use the CONCATENATE function or ampersands (&):
=CONCATENATE("""", A1, ", ", A2, ", ", A3, """")
For something more powerful, write a Basic macro that uses Join.
EDIT:
There is no function that can concatenate a range. Instead, write a Basic macro or drag and drop CONCATENATE formulas to multiple cells. See https://forum.openoffice.org/en/forum/viewtopic.php?f=9&t=5438.