Join only some cells from a column - Openrefine - cell

I've a dataset like this:
I need to join only the last three columns (the authors) and the join function is not helping me. I don't have always 3 authors: they can be 2, it can be 1.
Is there a way to join only the cells which have empty cells in the near column?

A solution a bit simpler is to use slice() to select all values on each record except the first one :
row.record.cells['Column name'].value.slice(1).join("|")

Well, I used a workaround:
first I add another column with row.record.cells.NameColumn.value.join("|")
then in the new column I eliminated the book title doing value.replace(/(^[^\|]+)\|(.+)$/, "$2")

Related

Power BI Query Editor - Inserting Rows to the Bottom of a Table

I have a table that looks like this in Power BI:
CourseID Course
1 Math
2 Cooking
3 English
4 Spanish
I want to insert a row at the bottom that looks like this:
-1 null
I looked at the Microsoft Documentation, but it's not very helpful. Any suggestions?
Figured out what I was doing wrong.
Let's say I wanted to insert a row, at top of a table, named Courses.
The M query would look like this
#Insert Row = Table.InsertRows(#PreviousStepName,0,{[CourseID=-1,Course=null]})
Was not able to get it working if the column name had a space in it, but for best practices, you shouldn't have columns with spaces in the name anyway.
Follow below steps to insert a row to the bottom of a table:
Step 1: Go to the edit query window. select the correct table.
Step 2: Go to the APPLIED STEPS window on the right-hand side. double click on Source.
Step 3: Add the data to each column and press ok.
Note that, this will only add a row to the end of the table.
The accepted answer above by OP works great if you want to insert a row at the top of the table, or if you know the index to insert at. In my case I want to append to the end, and the amount of rows is likely to change in future.
I saw the functions to reverse the rows, so you could reverse the rows, insert at 0, then reverse again, but I found another way:
#"Insert Row" = Table.Combine({#"Previous Step Name",
Table.FromRows(
{{null, "None", true, false}},
{"ID", "Name", "Deleted", "IsIntegration"}
)
})
Basically, you create a new table with Table.FromRows, then use Table.Combine to append it

Power query append multiple tables with single column regardless column names

I have the following query in M:
= Table.Combine({
Table.Distinct(Table.SelectColumns(Tab1,{"item"})),
Table.Distinct(Table.SelectColumns(Tab2,{"Column1"}))
})
Is it possible to get it working without prior changing column names?
I want to get something similar to SQL syntax:
select item from Tab1 union all
select Column1 from Tab2
If you need just one column from each table then you may use this code:
= Table.FromList(List.Distinct(Tab1[item])
& List.Distinct(Tab2[Column1]))
If you use M (like in your example or the append query option) the columns names must be the same otherwise it wont work.
But it works in DAX with the command
=UNION(Table1; Table2)
https://learn.microsoft.com/en-us/dax/union-function-dax
It's not possible in Power Query M. Table.Combine make an union with columns that match. If you want to keep all in the same step you can add the change names step instead of tap2 like you did with Table.SelectColumns.
This comparison of matching names is to union in a correct way.
Hope you can manage in the same step if that's what you want.

Count How Many In Column A are also in Column B

I'm trying to create a formula to count how many cells in a first column are also in a second column. It's 2 lists from the same source of unique id's, fitting 2 different criteria, and I want to count how many fit both criteria. Any help is appreciated.
try like this:
=ARRAYFORMULA(SUM(N(REGEXMATCH(B1:B, TEXTJOIN("|", 1, A1:A)))))

Sum values in one column of all rows which match one or more of multiple criteria

I have some table data in which I'd like to sum all the values in a specific column of all rows where column A contains string A and/or column B contains string B. How can I achieve this?
This works for one criterium:
=SUM(FILTER(G:G,REGEXMATCH(F:F,"stringA")))
I tried this, but it didn't work:
=SUM(FILTER(G:G,OR(ISTEXT(REGEXMATCH(F:F,"stringA")),ISTEXT(REGEXMATCH(C:C,"stringB")))))
Please try:
=SUM(FILTER(G:G,REGEXMATCH(F:F,"stringA")+REGEXMATCH(C:C,"stringB")))
+ works for or logic. ISTEXT is not needed because REGEXMATCH gives true or false.
OR does not work because filter is an arrayformula, use + in array formulas.
=SUM(FILTER(G:G,REGEXMATCH(F:F&C:C,"stringA|stringB")))
OR is denoted by |
EDIT Added &C:C to denote different Columns

Searching for unmatched ntheames when comparing spreadsheets

In one spreadsheet I have 3 columns with a first and last name of a person combined. In the 2nd spreadsheet, I have column a = first name and column b = last name.
I want to know which names in spreadsheet one cannot be found in spreadsheet two. I also need to verify the data to make sure that the formula was accurate on finding the correct lookup.
Do I have to combine my columns in spreadsheet 2 to make the first and last name in the same column to make this work?
Which formula would you use for either scenario?
Use this:
=ISNA(MATCH($A1&" "&$B1,Sheet2!$A:$A,FALSE)))
Where (in order):
A1 is the first name column in Sheet1
B1 is the last name column in Sheet1
Sheet2 is the sheet that has the data stored as names separately
$A:$A is the rows that have the two names together
FALSE is because it's an exact match
This will return FALSE if the element does not exist, and TRUE if it does
You can also use:
=VLOOKUP($A1&" "&$B1,Sheet2!$A:$D,3,FALSE)
If you want to retrieve data for a match.
Finally, if you need to do your lookups the other way, take a look at this thread for some ideas on how to split the string into two pieces.