Trying to compare two columns in GoogleSheets with this formula in Column C:
=if(A1=B1,"","Mismatch")
Works fine, but I'm getting a lot of false positives:
A.
B
C
MARY JO
Mary Jo
JAY, TIM
TIM JAY
Mismatch
Sam Ron
Sam Ron
Mismatch
Jack *Ma
Jack MA
Mismatch
Any ideas how to work this?
This uses a score based approach to determine a match. You can determine what is/isn't a match based on that score:
Score Formula = getMatchScore(A1,B1)
Match Formula = if(C1<.7,"mismatch",)
function getMatchScore(strA, strB, ignoreCase=true) {
strA = String(strA);
strB = String(strB)
const toLowerCase = ignoreCase ? str => str.toLowerCase() : str => str;
const splitWords = str => str.split(/\b/);
let [maxLenStr, minLenStr] = strA.length > strB.length ? [strA, strB] : [strB, strA];
maxLenStr = toLowerCase(maxLenStr);
minLenStr = toLowerCase(minLenStr);
const maxLength = maxLenStr.length;
const minLength = minLenStr.length;
const lenScore = minLength / maxLength;
const orderScore = Array.from(maxLenStr).reduce(
(oldItem, nItem, index) => nItem === minLenStr[index] ? oldItem + 1 : oldItem, 0
) / maxLength;
const maxKeyWords = splitWords(maxLenStr);
const minKeyWords = splitWords(minLenStr);
const keywordScore = minKeyWords.reduce(({ score, searchWord }, nItem) => {
const newSearchWord = searchWord?.replace(new RegExp(nItem, ignoreCase ? 'i' : ''), '');
score += searchWord.length != newSearchWord.length ? 1: 0;
return { score, searchWord: newSearchWord };
}, { score: 0, searchWord: maxLenStr }).score / minKeyWords.length;
const sortedMaxLenStr = Array.from(maxKeyWords.sort().join(''));
const sortedMinLenStr = Array.from(minKeyWords.sort().join(''));
const charScore = sortedMaxLenStr.reduce((oldItem, nItem, index) => {
const surroundingChars = [sortedMinLenStr[index-1], sortedMinLenStr[index], sortedMinLenStr[index+1]]
.filter(char => char != undefined);
return surroundingChars.includes(nItem)? oldItem + 1 : oldItem
}, 0) / maxLength;
const score = (lenScore * .15) + (orderScore * .25) + (charScore * .25) + (keywordScore * .35);
return score;
}
try:
=ARRAYFORMULA(IFERROR(IF(LEN(
REGEXREPLACE(REGEXREPLACE(LOWER(A1:A), "[^a-z ]", ),
LOWER("["&B1:B&"]"), ))>0, "mismatch", )))
Implementing fuzzy matching via Google Sheets formula would be difficult. I would recommend using a custom formula for this one or a full blown script (both via Google Apps Script) if you want to populate all rows at once.
Custom Formula:
function fuzzyMatch(string1, string2) {
string1 = string1.toLowerCase()
string2 = string2.toLowerCase();
var n = -1;
for(i = 0; char = string2[i]; i++)
if (!~(n = string1.indexOf(char, n + 1)))
return 'Mismatch';
};
What this does is compare if the 2nd string's characters order is found in the same order as the first string. See sample data below for the case where it will return mismatch.
Output:
Note:
Last row is a mismatch as 2nd string have r in it that isn't found at the first string thus correct order is not met.
If this didn't meet your test cases, add a more definitive list that will show the expected output of the formula/function so this can be adjusted, or see player0's answer which solely uses Google Sheets formula and is less stricter with the conditions.
Reference:
https://stackoverflow.com/a/15252131/17842569
The main limitation of traditional fuzzy matching is that it doesn’t take into consideration similarities outside of the strings. Topic clustering requires semantic understanding. Goodlookup is a smart function for spreadsheet users that gets very close to semantic understanding. It’s a pre-trained model that has the intuition of GPT-3 and the join capabilities of fuzzy matching. Use it like vlookup or index match to speed up your topic clustering work in google sheets.
https://www.goodlookup.com/
Related
I have soma data, starting from A10 to column M, until the 59th row.
I have some dates in column F10:F that are text strings, converted to official dates in column N (here the question with the process)
M3 is set to =NOW().
In cell N3 I have: =M3+14.
I want to delete all the rows, with a date in column N10:N that comes before [today + 2 weeks] (so cell N3).
When I create a script in Apps Script, it doesn't run the if statement, but if I leave it in comments, it can go in the for loop and deletes the rows, so I'm pretty sure the problem is, again, date formatting.
In this question I ask: how do I compare the values of N10:N with N3, in order to delete all the rows that don't meet the condition if(datesNcol <= targetDate)? (in code is written as if (rowData[i] < flatArray))
I leave also a demo sheet with this problem explained in detail and two alternatives (getBackground condition and numeric days condition).
Attempts:
This is a simplified code example:
const gen = SpreadsheetApp.getActiveSpreadsheet().getSheetByName('Generatore');
const bVals = gen.getRange('B10:B').getValues();
const bFilt = bVals.filter(String);
const dataLastRow = bFilt.length;
function deleteExpired() {
dateCorrette(); //ignore, formula that puts corrected dates from N10 to dataLastRow
var dateCorrect = gen.getRange(10,14,dataLastRow,1).getValues();
var targetDate = gen.getRange('N3').getValues();
var flatArray = [].concat.apply([], targetDate);
for (var i = dateCorrect.length - 1; i >= 0; i--) {
var rowData = dateCorrect[i];
if (rowData[i] < flatArray) {
gen.deleteRow(i+10);
}
}
};
If run the script, nothing is deleted.
If I //comment the if function and the closing bracket, it delets all the rows of the list one by one.
I can't manage to meet that condition.
Right now, it logs this [Sun Jan 01 10:33:20 GMT-05:00 2023] as flatArray
and this [Wed Dec 21 03:00:00 GMT-05:00 2022] as dateCorrect[49], so the first row to delete, that is the 50th (is correct for all the dateCorrect[i] dates).
I tried putting a getTime() method in the targetDate variable, but it only functions if there is the getValue() method, not getValues(), so I then don't know how to use getTime() method on rowData, which is based on dateCorrected[i], which have to use the getValues() method. And then it also doesn't accept the flatArray variable, that has to be commented out (or it logs [ ] for flatArray, not the corrected date)
I leave the other attempts in the demo sheet, because I want to prioritize this problem around the date and make it clear in my head.
Thanks for all the help.
DEMO SHEET, ITA Locale time
I don't know how the demo sheet works with Apps Script, I suggest to copy the code in a personal sheet
UPDATE:
I've also tried putting an extra column, with an IF built-in function that writes "del" if the function has to be deleted.
=IF(O10>14;"del";"")
And then
var boba = gen.getRange(10,16,bLast,1).getDisplayValues();
.
.
if (boba[i] == 'del')
This does the job. But I can't understand why the other methods don't work.
Try this. It seems like you do a lot of things that aren't necessary. Unless I'm missing something.
A few notes. I typically do not use global variable, unless absolutely necessary. I don't create a variable for last row unless I have to use that value multiple times in my script. I use the method Sheet.getLastRow(). dataCorrect is a 2D array of 1 column so the second index can only be [0]. And getRange('N4') is a single cell so getValue() is good enough.
function deleteExpired() {
const gen = SpreadsheetApp.getActiveSpreadsheet().getSheetByName('Generatore');
var dateCorrect = gen.getRange(10,14,gen.getLastRow()-9,1).getValues();
var targetDate = gen.getRange('N3').getValue();
for (var i = dateCorrect.length - 1; i >= 0; i--) {
if (dataCorrect[i][0] < targetDate) {
gen.deleteRow(i+10);
}
}
}
Try this:
function delRows() {
const ss = SpreadsheetApp.getActive();
const gsh = ss.getSheetByName('Generatore');
const colB = gsh.getRange('B10:B' + gsh.getLastRow()).getValues();
var colN = gsh.getRange('N10:N' + gsh.getLastRow()).getValues();
var tdv = new Date(new Date().getFullYear(), new Date().getMonth(), new Date().getDate() + 14).valueOf();//current date + 14
let d = 0;
colN.forEach((n, i) => {
if (new Date(n).valueOf() < tdv) {
gsh.deleteRow(i + 10 - d++);
}
});
}
I found the following code to emulate the proper formula, but it has a wrong ( maybe outdated) syntax, and as far as i understood, it should applies to all columns of a given sheet.
function PROPER_CASE(str) {
if (typeof str != "string")
throw `Expected string but got a ${typeof str} value.`;
str = str.toLowerCase();
var arr = str.split(/.-:?—/ );
return arr.reduce(function(val, current) {
return val += (current.charAt(0).toUpperCase() + current.slice(1));
}, "");
}
Here's an example of the input :
A
B
C
D
ColumnA
ColumnB
ColumnC
ColumnD
EXCEL ACTION LIMIMTED (毅添有限公司)
207/2018
n/a
without-proper
Hang Wo Holdings
205/2015
35/2020
without-proper
central southwood limited
308/2019
n/a
without-proper
This would be the desired output:
ColumnA ColumnB ColumnC COlumnD
Excel Action Limited (毅添有限公司) 207/2018 n/a without-proper
Hang Wo Holdings 205/2015 35/2020 without-proper
Central Southwood Limited 308/2019 n/a without-proper
And this is the error output of that function :
Erro
Expected string but got a undefined value.
PROPER_CASE # macros.gs:115
This is the only way I can see of reproducing you results. I don't see how to avoid captalizing the first letter of the last two columns with avoiding them:
function lfunko() {
const ss = SpreadsheetApp.getActive();
const sh = ss.getSheetByName("Sheet0");
if (sh.getLastRow() > 4) {
sh.getRange(6, 1, sh.getLastRow() - 5, sh.getLastColumn()).clearContent();
SpreadsheetApp.flush();
}
const vs = sh.getDataRange().getDisplayValues().map((r, i) => {
return r.map((c, j) => {
if (i > 0 && j < 1) {
let arr = c.toString().toLowerCase().split(/.-:?-/g);
return arr.reduce((val, current) => {
//Logger.log(current)
return val += current.charAt(0).toUpperCase() + current.slice(1);
}, '');
} else {
return c;
}
});
});
Logger.log(JSON.stringify(vs))
sh.getRange(sh.getLastRow() + 2, 1, vs.length, vs[0].length).setValues(vs);
}
A
B
C
D
Data
ColumnA
ColumnB
ColumnC
ColumnD
EXCEL ACTION LIMIMTED (毅添有限公司)
207/2018
n/a
without-proper
Hang Wo Holdings
205/2015
35/2020
without-proper
central southwood limited
308/2019
n/a
without-proper
Outpput
ColumnA
ColumnB
ColumnC
ColumnD
Excel action limimted (毅添有限公司)
207/2018
n/a
without-proper
Hang wo holdings
205/2015
35/2020
without-proper
Central southwood limited
308/2019
n/a
without-proper
I have tested your code and it works fine. It does convert the input string into a proper case.
However, take note that in Google Sheets, when you get values, your data is in 2D Array or Nested Array.
So to apply this to your Spreadsheet after getting the values you will have to target the column you want to replace and loop through each string in the array. You will then have to setValues() back to the specified range to replace it in the spreadsheet.
Solution 1:
Try:
With your function, try adding this script to apply to your spreadsheet.
function setToColumn(){
var ss = SpreadsheetApp.getActiveSpreadsheet();
var sheet = ss.getActiveSheet();
var dataRange = sheet.getRange(1,1,sheet.getLastRow()); //2ND Parameter is the column, replace if you want to edit different column
var allData = dataRange.getValues().flat();
var properData = []
allData.forEach(function(data){
properData.push([PROPER_CASE(data)])
});
dataRange.setValues(properData);
}
From:
Result:
Solution 2:
If you don't mind using different script which only needs one function you may use the script below:
function properCase() {
var ss = SpreadsheetApp.getActiveSpreadsheet();
var sheet = ss.getActiveSheet();
var dataRange = sheet.getRange(1,1,sheet.getLastRow()); //2ND Parameter is the column, replace if you want to edit different column (1 = Column A, 2 = Column B)
var allData = dataRange.getValues().flat();
var properData = []
allData.forEach(function(data){
properData.push([data.toLowerCase().replace(/\b[a-z]/ig, function(match) {return match.toUpperCase()})]);
});
dataRange.setValues(properData);
}
Reference for Solution 2:
Apps script how to format a cell to Proper Text (Case)
as basic as this may sound I am having difficulty writing this. I have two columns with checkboxes in a sheet(main) and I want to be able to checkbox(true) column 'O' if column 'm' has a checkmark after I am done with the sheet(macro button).
Thanks for any input.
If M is true set O to true
function lfunko() {
const ss = SpreadsheetApp.getActive();
const sh = ss.getSheetByName("Sheet0");
const [hA,...vs] = sh.getDataRange().getValues();
vs.forEach((r,i) => {
if(r[12] == "TRUE") {
sh.getRange(i + 2, 15).setValue("TRUE");
}
})
}
this is the simplified version of my data.
How can I calculate E and F columns? the address format of E column is not important.
put this custom formula to E2 =last_item_index(A2:D)
and here is the custom function code:
/**
* #customfunction
*/
function last_item_index(range) {
const colCount = range[0].length
const lastItemIndices = range.map(row=>{
const indexFirstNonEmpty = row.reverse().findIndex(cell=>cell)
const indexLastNonEmpty = colCount - 1 - indexFirstNonEmpty
return indexFirstNonEmpty>=0? [indexLastNonEmpty, row[indexFirstNonEmpty]] : ['','']
})
return lastItemIndices
}
The value returned in column E will be the zero-based index of the last non-empty item. You can easy convert it to A1-formatted range with ADDRESS function.
try:
=INDEX(IFNA(REGEXEXTRACT(" "&TRIM(FLATTEN(QUERY(TRANSPOSE(IF(A2:D="",,
ADDRESS(ROW(A2:A), COLUMN(A:D), 4))),,9^9))), " (.{1,3}\d+)$")))
and:
=INDEX(IFNA(SUBSTITUTE(REGEXEXTRACT(" "&TRIM(FLATTEN(QUERY(TRANSPOSE(IF(A2:D="",,
SUBSTITUTE(A2:D, " ", "♦"))),,9^9))), "((?:[^ ]+ *){1})$"), "♦", " ")))
I am trying to split by date and event columns. It is impossible to search for ". " some lines contain multiple sentences ending with ". " Also, some lines don't start with dates. The idea of the script was to use a regexp to find lines starting with the fragment "one or two numbers, space, letters, period, space" and then replace "point, space" with a rare character, for example, "#". If the line does not start with this fragment, then add "#" to the beginning. Then this array can be easily divided into two parts by this symbol ("#") and written to the sheet.
Unfortunately, something went wrong today. I came across the fact that match(re) is always null. I ask for help in composing the correct regular expression and solving the problem.
Original text:
1 June. Astronomers report narrowing down the source of Fast Radio
Bursts (FRBs). It may now plausibly include "compact-object mergers
and magnetars arising from normal core collapse supernovae".[3][4]
The existence of quark cores in neutron stars is confirmed by Finnish
researchers.[5][6][7]
3 June. Researchers show that compared to rural populations urban red
foxes (pictured) in London are mirroring patterns of domestication
similar to domesticated dogs, as they adapt to their city
environment.[21]
The discovery of the oldest and largest structure in
the Maya region, a 3,000-year-old pyramid-topped platform Aguada
Fénix, with LiDAR technology is reported.
17 June. Physicists at the XENON dark matter research facility report
an excess of 53 events, which may hint at the existence of
hypothetical Solar axions.
Desired result:
Code:
function replace() {
const sheetName = "Sheet1";
const sheet = SpreadsheetApp.getActiveSpreadsheet().getSheetByName(sheetName);
const lr = sheet.getLastRow();
// const range = sheet.getRange(2, 4, lr - 1);
const range = sheet.getRange(100, 4, 5);
const arr = range.getValues();
const newArr = [];
const re = new RegExp("^([0-9]{1,2}\s[a-z]+\.)\s");
for (let i = 0; i < arr.length; i++) {
const match = arr[i][0].match(re);
if (match == null) {
let newEntry = "#" + arr[i];
newArr.push(newEntry);
} else {
// let newEntry = "#" + arr[i];
// newArr.push(newEntry);
}
}
// range.offset(0,1).setValues(newArr);
// console.log(newArr);
}
function breakapart() {
const ms = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']
const ss = SpreadsheetApp.getActive();
const sh = ss.getSheetByName('Sheet1');//Data Sheet
const osh = ss.getSheetByName('Sheet2');//Output Sheet
osh.clearContents();
const vs = sh.getRange(1, 1, sh.getLastRow(), sh.getLastColumn()).getDisplayValues().flat();
let oA = [];
vs.forEach(p => {
let f = p.split(/[. ]/);
if (!isNaN(f[0]) && ms.includes(f[1])) {
let s = p.slice(0, p.indexOf('.'));
let t = p.slice(p.indexOf('.')+2);
oA.push([s, t]);
} else {
oA.push(['',p]);
}
});
osh.getRange(1,1,oA.length,oA[0].length).setValues(oA);
}