Assign group numbers across all combinations in two columns - grouping

I have used pairwise cosine similarity to compare a bunch of company names and I have all the pairwise matches. I need to assign group numbers that take into account all the combinations, rather than just the first column. Some examples I found that used group_by() gave group numbers based solely on item1, not extending to the rows identified in item2.
Sample data:
enter image description here
Desired output is:
enter image description here
The output could also just be a column added to the sample data but with unique group numbers across all matched rows.

Related

How to get all cells that appear more than 5 times?

enter image description here
I have a table in OpenOffice that contains a column with region's codes (column J). Using table functions, how to get all codes that appear more than 5 times and write them in one cell?
Normally I would recommend breaking this problem down into smaller parts using helper columns. Or better yet, move the data into LibreOffice Base which can easily work with distinct values.
However, I managed to come up with a rather large formula that seems to do what you asked. Enter it as an array formula.
=TEXTJOIN(",";1;IF(COUNTIF(исходник.J$2:J$552;исходник.J2:J552)>5;IF(ROW(исходник.J2:J552)=MATCH(исходник.J2:J552;исходник.J$2:J$552;0)+ROW(J$2)-1;исходник.J2:J552;"")))
I can't test this on your actual data since your example is only an image, but let's say that there are six of both 77 and 37. Then this would show 77,37 as the result.
Here is a breakdown. Look up the functions in LibreOffice Online Help for more information.
=TEXTJOIN(",";1; — Join all results into a single cell, separated by commas.
IF(COUNTIF(исходник.J$2:J$552;исходник.J2:J552)>5; — Find codes that occur more than 5 times. This is the same as what you wrote.
IF(ROW(исходник.J2:J552)= — Compare the next result to the row number that we are currently looking at.
MATCH(исходник.J2:J552;исходник.J$2:J$552;0)+ROW(J$2)-1; — Determine the first row that has this code. We do this to get unique results instead of 6 or more of each code in the result.
исходник.J2:J552;""))) — Return the code. (Your formula simply returns 1 here, which doesn't seem to be what you want.) If it doesn't match, return an empty string rather than 0, because TEXTJOIN ignores empty strings.

Google Sheets - filter list excluding values from other list

I have a big sheet raw_data which is automatically populated by a script every 5 minutes. As such I cannot add new columns with formulas but have to solve problems in single formulas.
The challenge:
I need to pull out a list of unique values from a column O. At the same time, I need to filter out a certain set of values in range A55:A
I have this formula to pull out the unique values:
=SORT(UNIQUE(raw_data!O2:O))
I tried playing with match, but how do I "inverse" the result from the match as I'm actually looking to
exclude rather than include:
=SORT(UNIQUE(FILTER(raw_data!O2:O,IFERROR((Match(raw_data!O2:O,A75:A200,0))))))
I tried adding a NOT() around the Match() but that then gave me a no results error.
Anyone?
Instead of using NOT use ISNA
The above because MATCH returns #N/A when there is no match.
Related
Return FALSE for #N/A in if match statement
Filter out all of user's entries if one of them was selected

Find "real" formula for a formula group

I am trying to find the "real" formula of a group. For example, these are the formulas
=If(A$4>$A1,"Long","Short")
=If(B$4>$A1,"Long","Short")
=If(A$4>$A2,"Long","Short")
=If(A$4>$A$2,"Long","Short)
The forth formula is a different one. The first 3 formulas should be the same formula that is a cell fixed 4th row compare to a cell fixed at column A. The result should show 2 "real" formula. Something like this
=If($4>$A,"Long","Short")
=If($4>$A$2,"Long","Short")
How do I design a regex (or any other method) in VBA to extract that "real" formula from the "nominal" formulas?
Convert your formulas to display in R1C1 format by using File,Options,Formulas and ticking R1C1 style. In such a style your four example formulas would display (If they were entered in column 3) as
=IF(R4C[-2]>RC1,"Long","Short")
=IF(R4C[-1]>R[-1]C1,"Long","Short")
=IF(R4C[-2]>R[-1]C1,"Long","Short")
=IF(R4C[-2]>R2C1,"Long","Short")
The fixed portions of the addresses don't have braces [] so if you remove the braces and their contents you get
R4C>RC1
R4C>RC1
R4C>RC1
R4C>R2C1
and no 4 is different from the others

How do I apply a formula to a range without applying said formula to every cell?

I'm trying to apply a formula without having it add the formula data to each and every cell - in other words, I need the cells that are receiving the formula to be untouched until they get their data.
I was searching around and it looked like an ARRAYFORMULA would work but it doesn't seem to be doing anything when I apply it.
For example, I want to apply this formula to a cell range: =SPLIT(E2, ",")). Each cell in the E column needs to be split into two the two adjacent cells next to it based on it's comma. When I try to apply =ARRAYFORMULA(SPLIT(E2:E99, ",")) only the cell I add this to gets the formula.
In addition to the contribution of pnuts, also try:
=ArrayFormula(iferror(REGEXEXTRACT(","&E2:E,"^"&REPT(",+[^,]+",COLUMN(OFFSET(A1,,,1,6))-1)&",+([^,]+)")))
Note: the last parameter of OFFSET can be changed to match the maximum number of values you have in the cells of the range E2:E (separated by a comma). E.g: if you have a no more than 3 values per cell, set it to three. The output will then be three columns wide (one column for each value).
Hope that makes sense ?
Also credits due to AdamL who (I believe) orginally crafted this workaround.
I think what you want may be array_constrain but for your example I can only at present offer you two formulae (one for each side of the comma):
=Array_constrain(arrayformula(left(E2:E,find(",",E2:E)-1)),match("xxx",E:E)-1,1)
=Array_constrain(arrayformula(mid(E2:E,find(",",E2:E)+1,len(E2:E))),match("xxx",E:E)-1,1)

Convert alphanumeric string to 16 digit GCID

I'm building our inventory feed for Amazon Seller Central in OpenOffice Calc but can't work out how to convert our inhouse product IDs to the Amazon required format GCID.
The standard-product-id must have a specific number of characters according to type: GCID (16 alphanumeric characters), UPC (12 digit number), EAN (13 digit number) or GTIN(14 digit number).
Our product IDs vary by manufacturer, eg:-
123456
AB123456
1234AB
Where the ID is numerical only I can format the cells with leading zeros, however this doesn't work if the cell contains letters.
My file has over 10,000 products so I'm wondering if there is a formula I can apply to all cells to instantly convert them to GCID?
It seems the question was asked when under a misapprehension but having noticed that the example 123456 AB123456 1234AB represents three different IDs and aware that padding to a specified length is quite a common requirement (eg see String.PadLeft Method) a suggestion for OpenOffice might be of use to someone, one day.
Convention is to pad with 0s but since some spreadsheets automatically strip these off the front of numbers (as first example) and databases tend to prefer that fields are of consistent format I suggest separating the padding from the example with a hyphen, to aid identification of alpha numeric codes and to force text format:
=REPT(0;15-LEN(A1))&"-"&A1