Extracting multiple values with RegEx in a Google Sheet formula - regex

I have a Google spreadsheet with 2 columns.
Each cell of the first one contains JSON data, like this:
{
"name":"Love",
"age":56
},
{
"name":"You",
"age":42
}
Then I want a second column that would, using a formula, extract every value of name and string it like this:
Love,You
Right now I am using this formula:
=REGEXEXTRACT(A1, CONCATENER(CHAR(34),"name",CHAR(34),":",CHAR(34),"([^",CHAR(34),"]+)",CHAR(34),","))
The RegEx expresion being "name":"([^"]+)",
The problem being that it currently only returns the first occurence, like this:
Love
(Also, I don't know how many occurences of "name" there are. Could be anywhere from 0 to around 20.)
Is it even possible to achieve what I want?
Thank you so much for reading!
EDIT:
My JSON data starts with:
{
"time":4,
"annotations":[
{
Then in the middle, something like this:
{
"name":"Love",
"age":56
},
{
"name":"You",
"age":42
}
and ends with:
],
"topEntities":[
{
"id":247120,
"score":0.12561166,
"uri":"http://en.wikipedia.org/wiki/Revenue"
},
{
"id":31512491,
"score":0.12504959,
"uri":"http://en.wikipedia.org/wiki/Wii_U"
}
],
"lang":"en",
"langConfidence":1.0,
"timestamp":"2020-05-22T12:17:47.380"
}

Since your text is basically a JSON string, you may parse all name fields from it using the following custom function:
function ExtractNamesFromJSON(input) {
var obj = JSON.parse("[" + input + "]");
var results = obj.map((x) => x["name"])
return results.join(",")
}
Then use it as =ExtractNamesFromJSON(C1).
If you need a regex, use a similar approach:
function ExtractAllRegex(input, pattern,groupId,separator) {
return Array.from(input.matchAll(new RegExp(pattern,'g')), x=>x[groupId]).join(separator);
}
Then use it as =ExtractAllRegex(C1, """name"":""([^""]+)""",1,",").
Note:
input - current cell value
pattern - regex pattern
groupId - Capturing group ID you want to extract
separator - text used to join the matched results.

Related

Regexp Match Google Script inside a loop?

I've been on this problem for a couple of hours, I'm new to coding, so excuse me if it's a very simple question.
So I have a list of text and I want to find if there is one of the regular expression from the other sheet in every cell.
If yes, paste the regular expression next to the text.
Example:
For the first row:
7063 BIO PLANET LIEGE.
--> i'd like it to write "BIO PLANET" in the cell to the right. (Because BIO PLANET is one of the regular expression to test from the second sheet).
I wrote something like this, but couldn't really figure out what needs to be fixed:
function ExpenseMatching() {
var spreadsheet = SpreadsheetApp.getActive();
var sheet1 = spreadsheet.getSheetByName("Import2");
var sheet2 = spreadsheet.getSheetByName("Regular Expression");
for ( i =1; i<24 ; i++)
{
//Browser.msgBox(i)
var test1 = sheet2.getRange("A"+ i);
var test2 = sheet1.getRange("A2");
var test = new RegExp(test1).test(test2);
if (regexp==true)
{
test1.copyTo(sheet1.getRange("I2"));
Browser.msgBox(test)
}
else
{
}
}
}
Thanks is advance for your help guys !
You want to retrieve the values of the column "A" on the sheet Import2 and the values of the column "A" on the sheet Regular Expression.
You want to check whether the values of Import2 includes the values of Regular Expression. When the values of Import2 includes the values of Regular Expression, you want to put the value of Regular Expression to the column "B" on Import2.
You want to achieve this using Google Apps Script.
If my understanding is correct, how about this answer?
Modification points:
In your script,
if (regexp==true) doesn't work and an error occurs. Because regexp is not declared.
This has already been mentioned by Rubén's comment.
From your question, I thought that you want to put the result value to the column "B" of Import2. But it seems that your script puts the value to the column "I" from test1.copyTo(sheet1.getRange("I2")).
Your script checks only "A2" of Import2.
Each row is checked and copy the value in the for loop. In this case, the process cost will be high.
When above points are reflected to your script, how about the following modified script?
Modified script:
function ExpenseMatching() {
var spreadsheet = SpreadsheetApp.getActive();
var sheet1 = spreadsheet.getSheetByName("Import2");
var sheet2 = spreadsheet.getSheetByName("Regular Expression");
const values1 = sheet1.getRange(`A2:A${sheet1.getLastRow()}`).getValues();
const values2 = sheet2.getRange(`A2:A${sheet2.getLastRow()}`).getValues();
const res = values1.map(([r1]) => {
for (let i = 0; i < values2.length; i++) {
if (new RegExp(values2[i][0]).test(r1)) {
return [values2[i][0]];
}
}
return [""];
});
sheet1.getRange(2, 2, res.length, 1).setValues(res);
}
I think that in your situation, you can also use if (r1.includes(values2[i][0])) { instead of if (new RegExp(values2[i][0]).test(r1)) {. This might be able to reduce more cost.
Note:
In this modification, the result values are put to the column "B" of Import2.
Please run the script with enabling V8.
References:
map()
setValues()

Regexp to get the utm values

I'm looking to extract some of the utm values from a URL using regexp. My URL would look something like the below -
utm_source=ko_1d5b57661294a3154&utm_medium=internetq&utm_campaign=-android5436af9f1aef91a654a7255038&utm_term=searchthis&utm_content=mainpage&
Is there any way to have a regexp that would extract all the utm values such as utm_source, utm_medium, utm_capaign, utm_term, utm_content ?
You could grab all patching pairs and then convert it to an object.
NOTE: The object conversion is simplistic (doesn't account for multiple params of the same key, etc.).
var regexp = /(?!&)utm_[^=]*=[^&]*(?=&)/g;
var query = 'utm_source=ko_1d5b57661294a3154&utm_medium=internetq&utm_campaign=-android5436af9f1aef91a654a7255038&utm_term=searchthis&utm_content=mainpage&';
var matches = query.match(regexp);
var values = matches.reduce(function(obj, param) {
var keyVal = param.split('=');
obj[keyVal[0]] = keyVal[1];
return obj;
}, {});
document.write('<pre>' + JSON.stringify({
matches: matches,
values: values
}, null, 2) + '<pre>');
You could use a positive lookbehind for this case. The pattern would look like that:
(?<=utm_[a-z]+=)\w+
This pattern matches any alphanumerical characters that are preceeded by "utm_???="
Here i am what i am doing is getting every value between = and & sign.
/[^=]\w+(?=&)/g
Another one according to utm_
/[^utm_=]\w+(?=&)/g

Mongodb word count using map reduce

I have a problem with counting words
I want to count word in projects.log.subject.
ex) count [A],[B],[C]..
I searched how to use map reduce.. but I don't understand how to use it for result i want.
{
"_id": ObjectID("569f3a3e9d2540764d8bde59"),
"A": "book",
"server": "us",
"projects": [
{
"domainArray": [
{
~~~~
}
],
"log": [
{
~~~~~,
"subject": "[A][B]I WANT THIS"
}
],
"before": "234234234"
},
{
"domainArray": [
{
~~~~
}
],
"log": [
{
~~~~~,
"subject": "[B][C]I WANT THIS"
}
],
"before": "234234234"
},....
] //end of projects
}//end of document
This is a basic principle of using regular expressions and testing each string against the source string and emitting the found count for the result. In mapReduce terms, you want your "mapper" function to possibly emit multiple values for each "term" as a key, and for every array element present in each document.
So you basically want a source array of regular expressions to process ( likely just a word list ) to iterate and test and also iterate each array member.
Basically something like this:
db.collection.mapReduce(
function() {
var list = ["the", "quick", "brown" ]; // words you want to count
this.projects.forEach(function(project) {
project.log.forEach(function(log) {
list.forEach(function(word) {
var res = log.subject.match(new RegExp("\\b" + word + "\\b","ig"));
if ( res != null )
emit(word,res.length); // returns number of matches for word
});
});
});
},
function(key,values) {
return Array.sum(values);
},
{ "out": { "inline": 1 } }
)
So the loop processes the array elements in the document and then applies each word to look for with a regular expression to test. The .match() method will return an array of matches in the string or null if done was found. Note the i and g options for the regex in order to search case insensitive and beyond just the first match. You might need m for multi-line if your text includes line break characters as well.
If null is not returned, then we emit the current word as the "key" and the count as the length of the matched array.
The reducer then takes all output values from those emit calls in the mapper and simply adds up the emitted counts.
The result will be one document keyed by each "word/term" provided and the count of total occurances in the inspected field within the collection. For more fields, just add more logic to sum up the results, or similarly just keep "emitting" in the mapper and let the reducer do the work.
Note the "\\b" represents a word boundary expression to wrap each term escaped by` in order to construct the expression from strings. You need these to discriminate "the" from "then" for example, by specifying where the word/term ends.
Also that as regular expressions, characters like [] are reserved, so if you actually were looking for strings like that the you similarly escape, i.e:
"\[A\]"
But if you were actually doing that, then remove the word boundary characters:
new RegExp( "\[A\]", "ig" )
As that is enough of a complete match in itself.

find str in another str with regex

I defined:
var s1="roi john";
var s2="hello guys my name is roi levi or maybe roy";
i need to split the words in s1 and check if they contains in s2
if yes give me the specific exists posts
The best way to help me with this, it is makes it as regex, cause i need this checks for mongo db.
Please let me know the proper regex i need.
Thx.
Possibly was something that could be answered with just the regular expression (and is actually) but considering the data:
{ "phrase" : "hello guys my name is roi levi or maybe roy" }
{ "phrase" : "and another sentence from john" }
{ "phrase" : "something about androi" }
{ "phrase" : "johnathan was here" }
You match with MongoDB like this:
db.collection.find({ "phrase": /\broi\b|\bjohn\b/ })
And that gets the two documents that match:
{ "phrase" : "hello guys my name is roi levi or maybe roy" }
{ "phrase" : "and another sentence from john" }
So the regex works by keeping the word boundaries \b around the words to match so they do not partially match something else and are combined with an "or" | condition.
Play with the regexer for this.
Doing open ended $regex queries like this in MongoDB can be often bad for performance. Not sure of your actual use case for this but it is possible that a "full text search" solution would be better suited to your needs. MongoDB has full text indexing and search or you can use an external solution.
Anyhow, this is how you mactch your words using a $regex condition.
To actually process your string as input you will need some code before doing the search:
var string = "roi john";
var splits = string.split(" ");
for ( var i = 0; i < splits.length; i++ ) {
splits[i] = "\\b" + splits[i] + "\\b";
}
exp = splits.join("|");
db.collection.find({ "phrase": { "$regex": exp } })
And possibly even combine that with the case insensitive "$option" if that is what you want. That second usage form with the literal $regex operator is actually a safer form form usage in languages other than JavaScript.
using a loop to iterate over the words of s1 and checking with s2 will give the expected result
var s1="roi john";
var s2="hello guys my name is roi levi or maybe roy";
var arr1 = s1.split(" ");
for(var i=0;i<=arr1.length;i++){
if (s2.indexOf(arr1[i]) != -1){
console.log("The string contains "+arr1[i]);
}
}

regex how can I split this word?

I have a list of several phrases in the following format
thisIsAnExampleSentance
hereIsAnotherExampleWithMoreWordsInIt
and I'm trying to end up with
This Is An Example Sentance
Here Is Another Example With More Words In It
Each phrase has the white space condensed and the first letter is forced to lowercase.
Can I use regex to add a space before each A-Z and have the first letter of the phrase be capitalized?
I thought of doing something like
([a-z]+)([A-Z])([a-z]+)([A-Z])([a-z]+) // etc
$1 $2$3 $4$5 // etc
but on 50 records of varying length, my idea is a poor solution. Is there a way to regex in a way that will be more dynamic? Thanks
A Java fragment I use looks like this (now revised):
result = source.replaceAll("(?<=^|[a-z])([A-Z])|([A-Z])(?=[a-z])", " $1$2");
result = result.substring(0, 1).toUpperCase() + result.substring(1);
This, by the way, converts the string givenProductUPCSymbol into Given Product UPC Symbol - make sure this is fine with the way you use this type of thing
Finally, a single line version could be:
result = source.substring(0, 1).toUpperCase() + source(1).replaceAll("(?<=^|[a-z])([A-Z])|([A-Z])(?=[a-z])", " $1$2");
Also, in an Example similar to one given in the question comments, the string hiMyNameIsBobAndIWantAPuppy will be changed to Hi My Name Is Bob And I Want A Puppy
For the space problem it's easy if your language supports zero-width-look-behind
var result = Regex.Replace(#"thisIsAnExampleSentanceHereIsAnotherExampleWithMoreWordsInIt", "(?<=[a-z])([A-Z])", " $1");
or even if it doesn't support them
var result2 = Regex.Replace(#"thisIsAnExampleSentanceHereIsAnotherExampleWithMoreWordsInIt", "([a-z])([A-Z])", "$1 $2");
I'm using C#, but the regexes should be usable in any language that support the replace using the $1...$n .
But for the lower-to-upper case you can't do it directly in Regex. You can get the first character through a regex like: ^[a-z] but you can't convet it.
For example in C# you could do
var result4 = Regex.Replace(result, "^([a-z])", m =>
{
return m.ToString().ToUpperInvariant();
});
using a match evaluator to change the input string.
You could then even fuse the two together
var result4 = Regex.Replace(#"thisIsAnExampleSentanceHereIsAnotherExampleWithMoreWordsInIt", "^([a-z])|([a-z])([A-Z])", m =>
{
if (m.Groups[1].Success)
{
return m.ToString().ToUpperInvariant();
}
else
{
return m.Groups[2].ToString() + " " + m.Groups[3].ToString();
}
});
A Perl example with unicode character support:
s/\p{Lu}/ $&/g;
s/^./\U$&/;