Google App Script Regex for Extraction and Ignoring certain items - regex

I need a little help with regex extraction. The body content of the email appears like this when retrieved in google sheet from gmail (has asterisk before and after name /phone / email which is hyperlinked)
Body Content after being retrived from Gmail
Email: *abc#gmail.com `<abc#gmail.com>`*
First Name: *John Doe*
Phone Number: *123456789*
My current Regex code retrieves the data like this
*abc#gmail.com `<abc#gmail.com>`*
*John Doe*
*123456789*
What changes should be made to the code so that it ignores the asterisk before and after for all these and the email is retrieved as abc#gmail.com ignoring the second part of the hyperlink format? Like
abc#gmail.com
John Doe
123456789
My Code is
function extractDetails(message){
var emailData = {
date: "Null",
fullName: "Null",
emailAddr: "Null",
phoneNum: "Null",
}
var emailKeywords = {
fullName: "First Name:",
emailAddr: "Email:",
phoneNum: "Phone Number:",
}
emailData.date = message.getDate();
emailData.body = message.getPlainBody();
var regExp;
regExp = new RegExp("(?<=" + emailKeywords.fullName + ").*");
emailData.fullName = emailData.body.match(regExp).toString().trim();
regExp = new RegExp("(?<=" + emailKeywords.phoneNum + ").*");
emailData.phoneNum = emailData.body.match(regExp).toString().trim();
regExp = new RegExp("(?<=" + emailKeywords.emailAddr + ").*");
emailData.emailAddr = emailData.body.match(regExp).toString().trim();

Replace the last 6 lines of your code with:
regExp = new RegExp("(?<=" + emailKeywords.fullName + "\\s*\\*).*?(?=\\*)");
emailData.fullName = emailData.body.match(regExp).toString();
regExp = new RegExp("(?<=" + emailKeywords.phoneNum + "\\s*\\*).*?(?=\\*)");
emailData.phoneNum = emailData.body.match(regExp).toString();
regExp = new RegExp("(?<=" + emailKeywords.emailAddr + "\\s*\\*).*?(?=\\s)");
emailData.emailAddr = emailData.body.match(regExp).toString();
(?<=Email:\s*\*).*?(?=\s)
(?<=Email:\s*\*) go to the point where it preceded by Email: followed by zero or more whitespace character \s*, followed by a literal *.
.*? then match any character except for new lines as few times as possible. Until a whitespace character appears (?=\s), See regex demo.
(?<=First Name:\s*\*).*?(?=\*)
(?<=First Name:\s*\*) go to the point where it preceded by First Name: followed by zero or more whitespace character \s*, followed by a literal *
.*? then match any character except for new lines as few times as possible. Until a literal * character appears (?=\*), See regex demo.

Related

Regex To Match Comma Separated Values Between Round Brackets [duplicate]

I am trying to write a regular expression which returns a string which is between parentheses. For example: I want to get the string which resides between the strings "(" and ")"
I expect five hundred dollars ($500).
would return
$500
Found Regular expression to get a string between two strings in Javascript
I don't know how to use '(', ')' in regexp.
You need to create a set of escaped (with \) parentheses (that match the parentheses) and a group of regular parentheses that create your capturing group:
var regExp = /\(([^)]+)\)/;
var matches = regExp.exec("I expect five hundred dollars ($500).");
//matches[1] contains the value between the parentheses
console.log(matches[1]);
Breakdown:
\( : match an opening parentheses
( : begin capturing group
[^)]+: match one or more non ) characters
) : end capturing group
\) : match closing parentheses
Here is a visual explanation on RegExplained
Try string manipulation:
var txt = "I expect five hundred dollars ($500). and new brackets ($600)";
var newTxt = txt.split('(');
for (var i = 1; i < newTxt.length; i++) {
console.log(newTxt[i].split(')')[0]);
}
or regex (which is somewhat slow compare to the above)
var txt = "I expect five hundred dollars ($500). and new brackets ($600)";
var regExp = /\(([^)]+)\)/g;
var matches = txt.match(regExp);
for (var i = 0; i < matches.length; i++) {
var str = matches[i];
console.log(str.substring(1, str.length - 1));
}
Simple solution
Notice: this solution can be used for strings having only single "(" and ")" like string in this question.
("I expect five hundred dollars ($500).").match(/\((.*)\)/).pop();
Online demo (jsfiddle)
To match a substring inside parentheses excluding any inner parentheses you may use
\(([^()]*)\)
pattern. See the regex demo.
In JavaScript, use it like
var rx = /\(([^()]*)\)/g;
Pattern details
\( - a ( char
([^()]*) - Capturing group 1: a negated character class matching any 0 or more chars other than ( and )
\) - a ) char.
To get the whole match, grab Group 0 value, if you need the text inside parentheses, grab Group 1 value.
Most up-to-date JavaScript code demo (using matchAll):
const strs = ["I expect five hundred dollars ($500).", "I expect.. :( five hundred dollars ($500)."];
const rx = /\(([^()]*)\)/g;
strs.forEach(x => {
const matches = [...x.matchAll(rx)];
console.log( Array.from(matches, m => m[0]) ); // All full match values
console.log( Array.from(matches, m => m[1]) ); // All Group 1 values
});
Legacy JavaScript code demo (ES5 compliant):
var strs = ["I expect five hundred dollars ($500).", "I expect.. :( five hundred dollars ($500)."];
var rx = /\(([^()]*)\)/g;
for (var i=0;i<strs.length;i++) {
console.log(strs[i]);
// Grab Group 1 values:
var res=[], m;
while(m=rx.exec(strs[i])) {
res.push(m[1]);
}
console.log("Group 1: ", res);
// Grab whole values
console.log("Whole matches: ", strs[i].match(rx));
}
Ported Mr_Green's answer to a functional programming style to avoid use of temporary global variables.
var matches = string2.split('[')
.filter(function(v){ return v.indexOf(']') > -1})
.map( function(value) {
return value.split(']')[0]
})
Alternative:
var str = "I expect five hundred dollars ($500) ($1).";
str.match(/\(.*?\)/g).map(x => x.replace(/[()]/g, ""));
→ (2) ["$500", "$1"]
It is possible to replace brackets with square or curly brackets if you need
For just digits after a currency sign : \(.+\s*\d+\s*\) should work
Or \(.+\) for anything inside brackets
let str = "Before brackets (Inside brackets) After brackets".replace(/.*\(|\).*/g, '');
console.log(str) // Inside brackets
var str = "I expect five hundred dollars ($500) ($1).";
var rex = /\$\d+(?=\))/;
alert(rex.exec(str));
Will match the first number starting with a $ and followed by ')'. ')' will not be part of the match. The code alerts with the first match.
var str = "I expect five hundred dollars ($500) ($1).";
var rex = /\$\d+(?=\))/g;
var matches = str.match(rex);
for (var i = 0; i < matches.length; i++)
{
alert(matches[i]);
}
This code alerts with all the matches.
References:
search for "?=n"
http://www.w3schools.com/jsref/jsref_obj_regexp.asp
search for "x(?=y)"
https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects/RegExp
Simple:
(?<value>(?<=\().*(?=\)))
I hope I've helped.

Regex for characters in specific location in string

Using notepad++, how can I replace the -s noted by the carats? The dashes I want to replace occurs every 7th character in the string.
11.871-2-2.737-2.00334-2
^ ^ ^
123456781234567812345678
It's pretty simple since it's only dashes:
(\S*?)-
Begin capture group.............................. (
Find any number of non-space chars... \S*
Lazily until...............................................?
End capture group...................................)
No capture find hyphen...........................-
Demo 1
var str = `11.871-2-2.737-2.00334-2`;
var sub = `$1`;
var rgx = /(\S*?)-/g;
var res = str.replace(rgx, sub);
console.log(res);
"There is a dash (right above 1) that I would like to preserve. This seems to get rid of all the dashes in the string"
The question clearly shows that there isn't a dash at the "1 position", but since there's a possibility that it's possible considering the pattern (n7). Don't have time to break it down, but I can refer you to a proper definition of the meta char \b.
Demo 2
var str = `-11.871-2-2.737-2.00334-2`;
var sub = `$1$2`;
var rgx = /\b[-]{1}(\S*?)-(\S*?)\b/g;
var res = str.replace(rgx, sub);
console.log(res);
Search for ([0-9\.-]{6,6})-
Replace with: $1MY_SEPARATOR

How to detect if a string contains hindi (devnagri) in it with character and word count

Below is a example string -
$string = "abcde वायरस abcde"
I need to check weather this string contains any Hindi (Devanagari) content and if so the count of characters and words. I guess regex with unicode character class can work http://www.regular-expressions.info/unicode.html. But I am not able to figure out the correct regex statement.
To find out, if a string contains a Hindi (Devanagari) character, you need to have a full list of all Hindi characters. According to this website, the Hindi characters are the hexadecimal characters between 0x0900 and 0x097F (decimal 2304 to 2431).
The regular expression pattern needs to match, if any of those characters are in the set. Therefore, you can use a pattern (actually a set of characters) to match the string, which looks like this:
[\u0900\u0901\u0902 ... \u097D\u097E\u097F]
Because it is rather cumbersome to manually write this list of characters down, you can generate this string by iterating over the decimal characters from 2304 to 2431 or over the hexadecimal characters.
To count all words containing at least one Hindi character, you can use the following pattern. It contains white-space (\s) around the word or the beginning (^) or the end ($) around the word, and a global flag, to match every occurence (/g):
/(?:^|\s)[\u0900\u0901\u0902 ... \u097D\u097E\u097F]+?(?:\s|$)/g
Here is a live implementation in JavaScript:
var numberOfHindiCharacters = 128;
var unicodeShift = 0x0900;
var hindiAlphabet = [];
for(var i = 0; i < numberOfHindiCharacters; i++) {
hindiAlphabet.push("\\u0" + (unicodeShift + i).toString(16));
}
var regex = new RegExp("(?:^|\\s)["+hindiAlphabet.join("")+"]+?(?:\\s|$)", "g");
var string1 = "abcde वायरस abcde";
var string2 = "abcde abcde";
[ string1.match(regex), string2.match(regex) ].forEach(function(match) {
if(match) {
console.log("String contains " + match.length + " words with Hindi characters only.");
} else {
console.log("String does NOT contain any words with Hindi characters only.");
}
});
It should be a range. The list of all characters is not required.
The following will detect a Devanagari word
[\u0900-\u097F]+

RegExp and special characters

I need to use regexp for matching and the code below works fine. However, I need to KEEP the dollar sign ($) as a true dollar sign and not a special character.
I've tried excluding but nothing is working.
IE: [^$]
Here's the code. It works as expected except when the text contains a $ or IS the $.
textNode = "$19,000";
regex = RegExp("$19,000",'ig');
text = '$';
textReplacerFunc: function (textNode, regex, text) {
var sTag = '<span class="highlight">';
var eTag = '</span>';
var re = '(?![^<>]*>)(' + text + '(?!#8212;))';
var regExp = new RegExp(re, 'ig');
textNode.data = textNode.data.replace(regExp, sTag + '$1' + eTag);
},
RESULT: $ not highlighted. desired results:
$19,000
Make sure to double escape the $ as in :
text = '\\$';
Since you are using construction of RegExp instance using a string here.

How to Split the message in as3?

Hi, am trying to split the word rtmp://xx.yyy.in/sample/test?22082208,False#&all this word.The word sample is dynamically added I don't know the count.
I want to split /sample/ how to do this kindly help me?
You want the string.split() method
http://help.adobe.com/en_US/FlashPlatform/reference/actionscript/3/String.html#split%28%29
var array:Array = myString.split("/"); //returns an array of everything in between /
In your case this will return
[0]->?rtmp:/ [1]->xx.yy.in [2]->sample [3]->test?22082208,False#&all
If you're looking for everything aside from the test?22082208,False#&all part and your URL will always be in this format you can use string.lastIndexOf()
var pos:int = string.lastIndexOf("/", 0); //returns the position of the last /
var newString:String = string.substr(0, pos); //creates a new string starting at 0 and ending at the last index of /
http://help.adobe.com/en_US/FlashPlatform/reference/actionscript/3/String.html#substr%28%29
You can do this (and almost everything) with regex:
var input:String = "rtmp://xx.yyy.in/sample/test?22082208,False#&all";
var pattern:RegExp = /^rtmp:\/\/.*\/([^\/]*)\/.*$/;
trace(input.replace(pattern, "$1")); //outputs "sample"
Here is the regex in details:
^ : start of the string
rtmp:\/\/ first string to find "rtmp://"
.* anything
\/ first slash
([^\/]) capture everything but a slash until...
\/ ...second slash
.* anything
$ the end
Then $1 represents the captured group between the parenthesis.