Split the data using regex with google script - regex

As a newbie, I have tried a lot to solve the below problem.
My Current table
TestID TestName Name Url
1592461 Google-page (www.google.com)
1592467 Yahoo - Page (www.yahoo.com)
I am trying to split the data present in the column "TestName" and add the result to the columns "Name" and "URL" as given in the below table
Expected table
TestID TestName Name Url
1592461 Google-page (www.google.com) Google-page www.google.com
1592467 Yahoo - Page (www.yahoo.com) Yahoo - Page www.yahoo.com
I have tried to compile the following script but was unsuccessful.
function getUrl(){
var ss = SpreadsheetApp.getActiveSpreadsheet();
var s1 = ss.getSheetByName("Sheet1");
var s2 = ss.getSheetByName("Sheet2");
var data = s1.getSheetValues(1, 2, s1.getLastRow() , 1);
var regExp = new RegExp("\(([^]]+)\)");
var row = [];
for(i = 0; i<data; i++) {
var url = regExp.exec(data)[i];
var output = s2.getRange("C2").setValue(url);
logger.log(url);
return url;
}
}
Could someone please help me in solving this.

In addition, I just wanted to let you know this can also be done with a (rather simple) formula. Enter in C1
=ArrayFormula(split(substitute(B2:B3, ")",""), "("))
Change range to suit.

I have an impression you want to get the data from Column 2 of the current spreadsheet into Column 3 and 4 in the same spreadsheet.
I suggest using the following regex:
var regExp = /(.*?)\(([^)]+)\)/;
The (.*?) will capture any 0+ chars other than line break chars into Group 1 (all before () then \( will match a ( and then ([^)]+) will capture 1+ chars other than ) into Group 2 (the URL) and then the \) will match a ).
And use it to analyze Column B data:
function getUrl(){
var ss = SpreadsheetApp.getActiveSpreadsheet();
var s1 = ss.getSheetByName("Sheet1");
var src_range = s1.getRange("B:B"); // Grab the Column B range
var regExp = /(.*?)\(([^)]+)\)/; // Define the regex
for(i = 1; i<=src_range.getLastRow(); i++) { // Loop through all the cells in the range
if (!src_range.getCell(i, 1).isBlank()) { // If the cell is not blank, process it
var m = regExp.exec(src_range.getCell(i, 1).getValue()); // Run the regex
if (m) { // If there is a match
var text = m[1]; // Text to be placed into Column C
s1.getRange('C' + i).setValue(text);
var url = m[2]; // URL to be placed into Column D
s1.getRange('D' + i).setValue(url);
}
}
}
}
See a sample document.

Related

get ranges inside formula

I would like to know if there is a practical way of extracting the cells that are used in a formula in google scripts?
For an example:
Let's say A1 has a formula as below
=page1!C2*0,8+page2!B29*0,15+page3!C144*0,05
I would like var myCellsrecord the data of
page1!C2
page2!B29
page3!C144
Please let me know how would you make this.
Thanks in advance
Description
Here is an sample script that can parse equations as shown into the reference cells.
Note this only works for the specific formula you specified.
Code.gs
function test() {
try {
let spread = SpreadsheetApp.getActiveSpreadsheet();
let sheets = spread.getSheets().map( sheet => sheet.getName() );
// for this test
sheets = ["page1","page2","page3"];
let sheet = spread.getSheetByName("Sheet1");
let formula = sheet.getRange("A1").getFormula();
console.log(formula);
// break into parts
let parts = formula.split("*"); // but notice this is for specific case of *
parts.pop() // the last part doesn't contain any cell reference
console.log(parts);
let i = 0;
let results = [];
parts.forEach( part => { let j = sheets.findIndex( sheet => part.indexOf(sheet) >= 0 )
// remove sheet from range
let k = part.split('!')[1]; // this give cell A1 notation
results.push(sheets[j]+k)
}
);
console.log(results);
}
catch(err) {
console.log(err);
}
}
Execution log
6:54:44 AM Notice Execution started
6:54:46 AM Info =page1!C2*0,8+page2!B29*0,15+page3!C144*0,05
6:54:46 AM Info [ '=page1!C2', '0,8+page2!B29', '0,15+page3!C144' ]
6:54:46 AM Info [ 'page1C2', 'page2B29', 'page3C144' ]
6:54:45 AM Notice Execution completed
Reference
Array.map
Range.getFormula()
String.split()
Array.pop()
Array.forEach()
Array.findIndex()
Use range.getFormula() to get the formula and then use regex with String.match to get the cells:
/*<ignore>*/console.config({maximize:true,timeStamps:false,autoScroll:false});/*</ignore>*/
const f =
'=page1!C2:C*0,8+page2!B29*0,15+page3!C144*0,056+sheet1!c:c* Sheet56!D10:D-D5:G10';
const matched = f.match(/(\w+!)?[A-Za-z]+\d*(:[A-Za-z]+\d*)?/g);
console.log(JSON.stringify(matched));
<!-- https://meta.stackoverflow.com/a/375985/ --> <script src="https://gh-canon.github.io/stack-snippet-console/console.min.js"></script>
(\w+!)? - [?optional]Matches one or more word followed by ! for sheet name(eg: page1!)
[A-Za-z]+\d* - Matches one or more letters [A-Z] followed by zero or more digits \d* for range string(eg: C2)
(:[A-Za-z]+\d*)? - [optional] another range string match preceded by a :(eg: :C50)

Find Replace with RegEx failing for string ending in ? Google script

I have a script in Google sheets
I am trying to find and replace headers on a sheet from a table of values on a different sheet
It is mostly working as desired but the replace is not working for any string that ends in ?
I do not know in advance when a ? will be present
I am using this:
const regex = new RegExp("(?<![^|])(?:" + search_for.join("|") + ")(?![^|])", "g");
I have tried to figure out how to correct my Regex but not getting it
Thanks in advance for your assistance with this
I have in a sheet:
search_for
replace_with
ABC Joe
MNQ
XYZ car
NNN XXX
DDD foo?
Bob bar
I have for Headers on a different sheet:
Label
Id
ABC Joe
XYZ car
DDD foo?
after running the replacement I want for headers:
Label
Id
MNQ
NNN XXX
Bob bar
what I get is:
Label
Id
MNQ
NNN XXX
DDD foo?
var data = range.getValues();
search_for.forEach(function(item, i) {
pair[item] = replace_with[i];
});
const regex = new RegExp("(?<![^|])(?:" + search_for.join("|") + ")(?![^|])", "g");
//Update Header row
//replace(/^\s+|\s+$|\s+(?=\s)/g, "") - Remove all multiple white-spaces and replaces with a single WS & trim
for(var m = 0; m<= data[0].length - 1; m++){
data[0][m] = data[0][m].replace(/^\s+|\s+$|\s+(?=\s)/g, "").replace(regex,(m) => pair[m])
}
A word of warning: what you're doing is scaring me a bit. I hope you know this is a brittle approach and it can go wrong.
You're not quoting the dynamic parts of the regex. The ? is a special character in regular expressions. I've written a solution to your problem below. Don't rely on my solution in production.
//var data = range.getValues();
var data = [
['Label', 'Id', 'ABC Joe', 'XYZ car', 'DDD foo?']
];
var search_for = [
'ABC Joe',
'XYZ car',
'DDD foo?'
];
var replace_with = [
'MNQ',
'NNN XXX',
'Bob bar'
];
var pair = {};
search_for.forEach(function(item, i) {
pair[item] = replace_with[i];
});
const regex = new RegExp("(?<![^|])(?:" + search_for.map((it) => quote(it)).join("|") + ")(?![^|])", "g");
for (var m = 0; m <= data[0].length - 1; m++) {
data[0][m] = data[0][m]
.replace(/^\s+|\s+$|\s+(?=\s)/g, "")
.replace(regex, (m) => pair[m]);
}
// see https://stackoverflow.com/a/3614500/11451
function quote(s) {
var regexpSpecialChars = /([\[\]\^\$\|\(\)\\\+\*\?\{\}\=\!])/gi;
return s.replace(regexpSpecialChars, '\\$1');
}
Can you not do something really simple like escaping all non-alphanumeric characters which would work with the example data you gave above and this seems trustworthy
function quote(s) {
var regexpSpecialChars = /((?=\W))/gi;
return s.replace(regexpSpecialChars, '\\');
}

Extract ID from an URL with RegExp

I have this kind of Url :
/clients/18378/offers/2219/items/32779
I'm trying to get an array with in it : 18378, 2219, 32779
I've try this code but unsuccessful :
let currentUrl = this.router.url; // = '/clients/18378/offers/2219/items/32779'
var regexRouteOffer = /\/clients\/(.*?)\/offers\/(.*?)\/items\/(.*?)/gm;
var match = currentUrl.match(regexArticleInOffer);
console.log("Test 1 >>", match); // => ["/clients/18378/offers/2219/items/"]
I've try with exec function but it give me only one of the number (first one only)
var matches = [];
for (var m = null; m = regexRouteOffer.exec(currentUrl); matches.push(m[1]));
console.log("Test 2 >> ", matches); //["18378"]
What I'm doing wrong?
You don't need the flags g (because you only want to match once), and m (because there's no need to turn on multiline mode). And finally the last .* is ungreedy, so it tries to match as few elements as possible (zero in this case), so remove all or at least the final ?.
let currentUrl = '/clients/18378/offers/2219/items/32779'
var regexRouteOffer = /\/clients\/(.*)\/offers\/(.*)\/items\/(.*)/;
var match = currentUrl.match(regexRouteOffer);
console.log(match[1]); // 18378
console.log(match[2]); // 2219
console.log(match[3]); // 32779

Auto Find and Replace Script in Google Sheets - Delete some certain cell content - global replace

Many cell entry's in my sheet contain extraneous words that I want to delete. I need a script to find keywords within a single column (in this case "B") and delete them in order. The goal is to make the cell entries shorter.
My keywords are "Epic Artifactory DIY", "Barn", "Planks", "Pack, "Coupon: WTXPXZP", "Coupon: FREESHIP50", "Coupon: SPRING10", and "Wall".
I found this script, but it will not work for me.
function fandr() {
var ss=SpreadsheetApp.getActiveSpreadsheet();
var s=ss.getActiveSheet();
var r=s.getDataRange();
var vlst=r.getValues();
var i,j,a,find,repl;
find="abc";
repl="xyz";
for (i in vlst) {
for (j in vlst[i]) {
a=vlst[i][j];
if (a==find) vlst[i][j]=repl;
}
}
r.setValues(vlst);
}
Thanks
Here is some code that gets the data in only one column, and replaces all the content with an empty string (deletes the words). Replace words in one column of a Google Sheet.
function replaceInColumn() {
var arrayWordsToFind,dataInColumn,dataAsString,newString,
newData,outerArray,i,lastrow,L,sh,ss,tempArray,toFind;
arrayWordsToFind = [
"Epic Artifactory DIY", "Barn", "Planks", "Pack",
"Coupon: WTXPXZP", "Coupon: FREESHIP50", "Coupon: SPRING10", "Wall"
]
ss = SpreadsheetApp.getActiveSpreadsheet();
sh = ss.getSheetByName("Your Sheet Name Here");
lastrow = sh.getLastRow();//Get row number of last row
//sh.getRange(start row, start column, number of Rows, number of Columns)
dataInColumn = sh.getRange(2, 2, lastrow).getValues();
dataAsString = dataInColumn.toString();//Convert 2D array to a string
//Logger.log('dataAsString: ' + dataAsString)
newString = dataAsString;
L = arrayWordsToFind.length;//The number of words to find
for (i=0;i<L;i++) {//Loop once for every word to find
toFind = new RegExp(arrayWordsToFind[i], "g");//define new Reg Ex with word to find - replace globally
newString = newString.replace(toFind,"");//Delete all found words
}
//Logger.log('newString: ' + newString)
newData = newString.split(",");//Convert string to 1D array
outerArray = [];
L = newData.length;
for (i=0;i<L;i++) {
//Logger.log('i: ' + i)
//Logger.log('newData[i]: ' + newData[i])
tempArray = [];//Reset
tempArray.push(newData[i]);
outerArray.push(tempArray);//Create a new 2D data array
}
sh.getRange(2, 2, outerArray.length).setValues(outerArray);
}
Key words: find replace column global
This Google Script function will help you find and replace text across all cells in a particular column (B in this case).
function findReplace() {
var sheet = SpreadsheetApp.getActiveSheet();
var data = sheet.getDataRange().getValues();
var column = 2;
for (var d=0, l=data.length; d<l; d++) {
if (data[d][column-1] === "find") {
sheet.getRange(d+1, column).setValue("replace");
}
}
SpreadsheetApp.flush();
}

If another cells's first word = a cell, then change the cell to the last word

Most weirdest question.. I understand.
Basically there is an events log, that says "Johno has changed his name to Johna" (example)
I want a script that will change the 'Johno' located in column A:A to Johna (based on the last word)
Try this:
function onEdit(e)
{
var sheet = SpreadsheetApp.getActiveSheet();
var editRange = sheet.getActiveRange();
var editRow = editRange.getRow();
var editCol = editRange.getColumn();
var lr = sheet.getLastRow()
var range = sheet.getRange("B:B");//only run when column B is changed
var rangeRowStart = range.getRow();
var rangeRowEnd = rangeRowStart + range.getHeight()-1;
var rangeColStart = range.getColumn();
var rangeColEnd = rangeColStart + range.getWidth()-1;
if (editRow >= rangeRowStart && editRow <= rangeRowEnd
&& editCol >= rangeColStart && editCol <= rangeColEnd)
{
var range = e.range;
var val=e.range.getValue()//get sentence
var name=val.split(" ").slice(-1)//get the last word
var row = sheet.getActiveCell().getRowIndex();//get changed row number
sheet.getRange(row,1 ).setValue(name);//set the name in column A
}}
You need to use regular expression.
To match the first word, the syntax is ^\w*, so =regexextract("hello world","^\w*") will give you "hello"
To match the last word, the syntax is either \w*\z or \w*$, so =regexextract("hello world","\w*$") will give you "world"
To replace last word with the first word, I tested the following
=regexreplace(
"hello brave new world",
"\w*$",
regexextract("hello brave new world", "^\w*")
)
Unfortunately it gives me "hello brave new hellohello". It looks like a bug (I tested similar code in Javascript and got "hello brave new hello" as expected).
So a workaround is something like
=REGEXREPLACE(A1 &".","(\w*[.])",REGEXEXTRACT(A1,"^\w*"))
which first adds "." to end of the string, and search for any word followed by "." instead.